RESUMO
Sex differences in mammalian complex traits are prevalent and are intimately associated with androgens1-7. However, a molecular and cellular profile of sex differences and their modulation by androgens is still lacking. Here we constructed a high-dimensional single-cell transcriptomic atlas comprising over 2.3 million cells from 17 tissues in Mus musculus and explored the effects of sex and androgens on the molecular programs and cellular populations. In particular, we found that sex-biased immune gene expression and immune cell populations, such as group 2 innate lymphoid cells, were modulated by androgens. Integration with the UK Biobank dataset revealed potential cellular targets and risk gene enrichment in antigen presentation for sex-biased diseases. This study lays the groundwork for understanding the sex differences orchestrated by androgens and provides important evidence for targeting the androgen pathway as a broad therapeutic strategy for sex-biased diseases.
Assuntos
Androgênios , Células , Caracteres Sexuais , Análise de Célula Única , Transcriptoma , Animais , Feminino , Humanos , Masculino , Camundongos , Androgênios/metabolismo , Androgênios/farmacologia , Apresentação de Antígeno/efeitos dos fármacos , Apresentação de Antígeno/genética , Imunidade Inata , Linfócitos/metabolismo , Linfócitos/citologia , Linfócitos/imunologia , Linfócitos/efeitos dos fármacos , Camundongos Endogâmicos C57BL , Transcriptoma/efeitos dos fármacos , Transcriptoma/genética , Biobanco do Reino Unido , Células/efeitos dos fármacos , Células/imunologia , Células/metabolismoRESUMO
Inference of cell-cell communication (CCC) provides valuable information in understanding the mechanisms of many important life processes. With the rise of spatial transcriptomics in recent years, many methods have emerged to predict CCCs using spatial information of cells. However, most existing methods only describe CCCs based on ligand-receptor interactions, but lack the exploration of their upstream/downstream pathways. In this paper, we proposed a new method to infer CCCs, called Intercellular Gene Association Network (IGAN). Specifically, it is for the first time that we can estimate the gene associations/network between two specific single spatially adjacent cells. By using the IGAN method, we can not only infer CCCs in an accurate manner, but also explore the upstream/downstream pathways of ligands/receptors from the network perspective, which are actually exhibited as a new panoramic cell-interaction-pathway graph, and thus provide extensive information for the regulatory mechanisms behind CCCs. In addition, IGAN can measure the CCC activity at single cell/spot resolution, and help to discover the CCC spatial heterogeneity. Interestingly, we found that CCC patterns from IGAN are highly consistent with the spatial microenvironment patterns for each cell type, which further indicated the accuracy of our method. Analyses on several public datasets validated the advantages of IGAN.
Assuntos
Comunicação Celular , Redes Reguladoras de Genes , Comunicação Celular/genética , Humanos , Biologia Computacional/métodos , Algoritmos , Análise de Célula Única/métodos , Transdução de SinaisRESUMO
Spatially resolved transcriptomics (SRT) has emerged as a powerful tool for investigating gene expression in spatial contexts, providing insights into the molecular mechanisms underlying organ development and disease pathology. However, the expression sparsity poses a computational challenge to integrate other modalities (e.g. histological images and spatial locations) that are simultaneously captured in SRT datasets for spatial clustering and variation analyses. In this study, to meet such a challenge, we propose multi-modal domain adaption for spatial transcriptomics (stMDA), a novel multi-modal unsupervised domain adaptation method, which integrates gene expression and other modalities to reveal the spatial functional landscape. Specifically, stMDA first learns the modality-specific representations from spatial multi-modal data using multiple neural network architectures and then aligns the spatial distributions across modal representations to integrate these multi-modal representations, thus facilitating the integration of global and spatially local information and improving the consistency of clustering assignments. Our results demonstrate that stMDA outperforms existing methods in identifying spatial domains across diverse platforms and species. Furthermore, stMDA excels in identifying spatially variable genes with high prognostic potential in cancer tissues. In conclusion, stMDA as a new tool of multi-modal data integration provides a powerful and flexible framework for analyzing SRT datasets, thereby advancing our understanding of intricate biological systems.
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Biologia Computacional/métodos , Redes Neurais de Computação , Neoplasias/genética , AlgoritmosRESUMO
Sequencing-based spatial transcriptomics technologies have revolutionized our understanding of complex biological systems by enabling transcriptome profiling while preserving spatial context. However, spot-level expression measurements often amalgamate signals from diverse cells, obscuring potential heterogeneity. Existing methods aim to deconvolute spatial transcriptomics data into cell type proportions for each spot using single-cell RNA sequencing references but overlook cell-type-specific gene expression, essential for uncovering intra-type heterogeneity. We present PANDA (ProbAbilistic-based decoNvolution with spot-aDaptive cell type signAtures), a novel method that concurrently deciphers spot-level gene expression into both cell type proportions and cell-type-specific gene expression. PANDA integrates archetypal analysis to capture within-cell-type heterogeneity and dynamically learns cell type signatures for each spot during deconvolution. Simulations demonstrate PANDA's superior performance. Applied to real spatial transcriptomics data from diverse tissues, including tumor, brain, and developing heart, PANDA reconstructs spatial structures and reveals subtle transcriptional variations within specific cell types, offering a comprehensive understanding of tissue dynamics.
RESUMO
Alerting for imminent earthquakes is particularly challenging due to the high nonlinearity and nonstationarity of geodynamical phenomena. In this study, based on spatiotemporal information (STI) transformation for high-dimensional real-time data, we developed a model-free framework, i.e., real-time spatiotemporal information transformation learning (RSIT), for extending the nonlinear and nonstationary time series. Specifically, by transforming high-dimensional information of the global navigation satellite system into one-dimensional dynamics via the STI strategy, RSIT efficiently utilizes two criteria of the transformed one-dimensional dynamics, i.e., unpredictability and instability. Such two criteria contemporaneously signal a potential critical transition of the geodynamical system, thereby providing early-warning signals of possible upcoming earthquakes. RSIT explores both the spatial and temporal dynamics of real-world data on the basis of a solid theoretical background in nonlinear dynamics and delay-embedding theory. The effectiveness of RSIT was demonstrated on geodynamical data of recent earthquakes from a number of regions across at least 4 y and through further comparison with existing methods.
RESUMO
Advances in single-cell multi-omics technology provide an unprecedented opportunity to fully understand cellular heterogeneity. However, integrating omics data from multiple modalities is challenging due to the individual characteristics of each measurement. Here, to solve such a problem, we propose a contrastive and generative deep self-expression model, called single-cell multimodal self-expressive integration (scMSI), which integrates the heterogeneous multimodal data into a unified manifold space. Specifically, scMSI first learns each omics-specific latent representation and self-expression relationship to consider the characteristics of different omics data by deep self-expressive generative model. Then, scMSI combines these omics-specific self-expression relations through contrastive learning. In such a way, scMSI provides a paradigm to integrate multiple omics data even with weak relation, which effectively achieves the representation learning and data integration into a unified framework. We demonstrate that scMSI provides a cohesive solution for a variety of analysis tasks, such as integration analysis, data denoising, batch correction and spatial domain detection. We have applied scMSI on various single-cell and spatial multimodal datasets to validate its high effectiveness and robustness in diverse data types and application scenarios.
Assuntos
Aprendizagem , MultiômicaRESUMO
Gene regulatory networks (GRNs) reveal the complex molecular interactions that govern cell state. However, it is challenging for identifying causal relations among genes due to noisy data and molecular nonlinearity. Here, we propose a novel causal criterion, neighbor cross-mapping entropy (NME), for inferring GRNs from both steady data and time-series data. NME is designed to quantify 'continuous causality' or functional dependency from one variable to another based on their function continuity with varying neighbor sizes. NME shows superior performance on benchmark datasets, comparing with existing methods. By applying to scRNA-seq datasets, NME not only reliably inferred GRNs for cell types but also identified cell states. Based on the inferred GRNs and further their activity matrices, NME showed better performance in single-cell clustering and downstream analyses. In summary, based on continuous causality, NME provides a powerful tool in inferring causal regulations of GRNs between genes from scRNA-seq data, which is further exploited to identify novel cell types/states and predict cell type-specific network modules.
Assuntos
Algoritmos , Redes Reguladoras de Genes , Entropia , Fatores de Tempo , Análise por ConglomeradosRESUMO
Methylation of cytosine to 5-methylcytosine (5mC) is a prevalent DNA modification found in many organisms. Sequential oxidation of 5mC by ten-eleven translocation (TET) dioxygenases results in a cascade of additional epigenetic marks and promotes demethylation of DNA in mammals1,2. However, the enzymatic activity and function of TET homologues in other eukaryotes remains largely unexplored. Here we show that the green alga Chlamydomonas reinhardtii contains a 5mC-modifying enzyme (CMD1) that is a TET homologue and catalyses the conjugation of a glyceryl moiety to the methyl group of 5mC through a carbon-carbon bond, resulting in two stereoisomeric nucleobase products. The catalytic activity of CMD1 requires Fe(II) and the integrity of its binding motif His-X-Asp, which is conserved in Fe-dependent dioxygenases3. However, unlike previously described TET enzymes, which use 2-oxoglutarate as a co-substrate4, CMD1 uses L-ascorbic acid (vitamin C) as an essential co-substrate. Vitamin C donates the glyceryl moiety to 5mC with concurrent formation of glyoxylic acid and CO2. The vitamin-C-derived DNA modification is present in the genome of wild-type C. reinhardtii but at a substantially lower level in a CMD1 mutant strain. The fitness of CMD1 mutant cells during exposure to high light levels is reduced. LHCSR3, a gene that is critical for the protection of C. reinhardtii from photo-oxidative damage under high light conditions, is hypermethylated and downregulated in CMD1 mutant cells compared to wild-type cells, causing a reduced capacity for photoprotective non-photochemical quenching. Our study thus identifies a eukaryotic DNA base modification that is catalysed by a divergent TET homologue and unexpectedly derived from vitamin C, and describes its role as a potential epigenetic mark that may counteract DNA methylation in the regulation of photosynthesis.
Assuntos
5-Metilcitosina/metabolismo , Proteínas de Algas/metabolismo , Ácido Ascórbico/metabolismo , Biocatálise , Chlamydomonas reinhardtii/enzimologia , DNA/química , DNA/metabolismo , 5-Metilcitosina/química , Dióxido de Carbono/metabolismo , Metilação de DNA , Glioxilatos/metabolismo , Nucleosídeos/química , Nucleosídeos/metabolismo , FotossínteseRESUMO
Spatial transcriptomics characterizes gene expression profiles while retaining the information of the spatial context, providing an unprecedented opportunity to understand cellular systems. One of the essential tasks in such data analysis is to determine spatially variable genes (SVGs), which demonstrate spatial expression patterns. Existing methods only consider genes individually and fail to model the inter-dependence of genes. To this end, we present an analytic tool STAMarker for robustly determining spatial domain-specific SVGs with saliency maps in deep learning. STAMarker is a three-stage ensemble framework consisting of graph-attention autoencoders, multilayer perceptron (MLP) classifiers, and saliency map computation by the backpropagated gradient. We illustrate the effectiveness of STAMarker and compare it with serveral commonly used competing methods on various spatial transcriptomic data generated by different platforms. STAMarker considers all genes at once and is more robust when the dataset is very sparse. STAMarker could identify spatial domain-specific SVGs for characterizing spatial domains and enable in-depth analysis of the region of interest in the tissue section.
Assuntos
Aprendizado Profundo , Perfilação da Expressão Gênica , Análise de Dados , Redes Neurais de Computação , TranscriptomaRESUMO
Glycine receptors (GlyR) conduct inhibitory glycinergic neurotransmission in the spinal cord and the brainstem. They play an important role in muscle tone, motor coordination, respiration, and pain perception. However, the mechanism underlying GlyR activation remains unclear. There are five potential glycine binding sites in α1 GlyR, and different binding patterns may cause distinct activation or desensitization behaviors. In this study, we investigated the coupling of protein conformational changes and glycine binding events to elucidate the influence of binding patterns on the activation and desensitization processes of α1 GlyRs. Subsequently, we explored the energetic distinctions between the apical and lateral pathways during α1 GlyR conduction to identify the pivotal factors in the ion conduction pathway preference. Moreover, we predicted the mutational effects of the key residues and verified our predictions using electrophysiological experiments. For the mutants that can be activated by glycine, the predictions of the mutational directions were all correct. The strength of the mutational effects was assessed using Pearson's correlation coefficient, yielding a value of -0.77 between the calculated highest energy barriers and experimental maximum current amplitudes. These findings contribute to our understanding of GlyR activation, identify the key residues of GlyRs, and provide guidance for mechanistic studies on other pLGICs.
Assuntos
Glicina , Receptores de Glicina , Receptores de Glicina/metabolismo , Receptores de Glicina/química , Humanos , Glicina/química , Glicina/metabolismo , Sítios de Ligação , Mutação , Conformação Proteica , Modelos MolecularesRESUMO
Complex diseases progression can be generally divided into three states, which are normal state, predisease/critical state and disease state. The sudden deterioration of diseases can be viewed as a bifurcation or a critical transition. Therefore, hunting for the tipping point or critical state is of great importance to prevent the disease deterioration. However, it is still a challenging task to detect the critical states of complex diseases with high-dimensional data, especially based on an individual. In this study, we develop a new method based on network fluctuation of molecules, temporal network flow entropy (TNFE) or temporal differential network flow entropy, to detect the critical states of complex diseases on the basis of each individual. By applying this method to a simulated dataset and six real diseases, including respiratory viral infections and tumors with four time-course and two stage-course high-dimensional omics datasets, the critical states before deterioration were detected and their dynamic network biomarkers were identified successfully. The results on the simulated dataset indicate that the TNFE method is robust under different noise strengths, and is also superior to the existing methods on detecting the critical states. Moreover, the analysis on the real datasets demonstrated the effectiveness of TNFE for providing early-warning signals on various diseases. In addition, we also predicted disease deterioration risk and identified drug targets for cancers based on stage-wise data.
Assuntos
Neoplasias , Biomarcadores , Progressão da Doença , Suscetibilidade a Doenças , Entropia , Humanos , Neoplasias/genéticaRESUMO
The integration of multi-omics data makes it possible to understand complex biological organisms at the system level. Numerous integration approaches have been developed by assuming a common underlying data space. Due to the noise and heterogeneity of biological data, the performance of these approaches is greatly affected. In this work, we propose a novel deep neural network architecture, named Deep Latent Space Fusion (DLSF), which integrates the multi-omics data by learning consistent manifold in the sample latent space for disease subtypes identification. DLSF is built upon a cycle autoencoder with a shared self-expressive layer, which can naturally and adaptively merge nonlinear features at each omics level into one unified sample manifold and produce adaptive representation of heterogeneous samples at the multi-omics level. We have assessed DLSF on various biological and biomedical datasets to validate its effectiveness. DLSF can efficiently and accurately capture the intrinsic manifold of the sample structures or sample clusters compared with other state-of-the-art methods, and DLSF yielded more significant outcomes for biological significance, survival prognosis and clinical relevance in application of cancer study in The Cancer Genome Atlas. Notably, as a deep case study, we determined a new molecular subtype of kidney renal clear cell carcinoma that may benefit immunotherapy in the viewpoint of multi-omics, and we further found potential subtype-specific biomarkers from multiple omics data, which were validated by independent datasets. In addition, we applied DLSF to identify potential therapeutic agents of different molecular subtypes of chronic lymphocytic leukemia, demonstrating the scalability of DLSF in diverse omics data types and application scenarios.
Assuntos
Neoplasias , Humanos , Neoplasias/genéticaRESUMO
Identifying differential genes over conditions provides insights into the mechanisms of biological processes and disease progression. Here we present an approach, the Kullback-Leibler divergence-based differential distribution (klDD), which provides a flexible framework for quantifying changes in higher-order statistical information of genes including mean and variance/covariation. The method can well detect subtle differences in gene expression distributions in contrast to mean or variance shifts of the existing methods. In addition to effectively identifying informational genes in terms of differential distribution, klDD can be directly applied to cancer subtyping, single-cell clustering and disease early-warning detection, which were all validated by various benchmark datasets.
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Análise por Conglomerados , Progressão da Doença , Perfilação da Expressão Gênica/métodos , HumanosRESUMO
MOTIVATION: Simultaneous profiling of multi-omics single-cell data represents exciting technological advancements for understanding cellular states and heterogeneity. Cellular indexing of transcriptomes and epitopes by sequencing allowed for parallel quantification of cell-surface protein expression and transcriptome profiling in the same cells; methylome and transcriptome sequencing from single cells allows for analysis of transcriptomic and epigenomic profiling in the same individual cells. However, effective integration method for mining the heterogeneity of cells over the noisy, sparse, and complex multi-modal data is in growing need. RESULTS: In this article, we propose a multi-modal high-order neighborhood Laplacian matrix optimization framework for integrating the multi-omics single-cell data: scHoML. Hierarchical clustering method was presented for analyzing the optimal embedding representation and identifying cell clusters in a robust manner. This novel method by integrating high-order and multi-modal Laplacian matrices would robustly represent the complex data structures and allow for systematic analysis at the multi-omics single-cell level, thus promoting further biological discoveries. AVAILABILITY AND IMPLEMENTATION: Matlab code is available at https://github.com/jianghruc/scHoML.
Assuntos
Algoritmos , Multiômica , Perfilação da Expressão Gênica , Transcriptoma , Análise por Conglomerados , Análise de Célula ÚnicaRESUMO
The recent advances in single-cell RNA sequencing (scRNA-seq) techniques have stimulated efforts to identify and characterize the cellular composition of complex tissues. With the advent of various sequencing techniques, automated cell-type annotation using a well-annotated scRNA-seq reference becomes popular. But it relies on the diversity of cell types in the reference, which may not capture all the cell types present in the query data of interest. There are generally unseen cell types in the query data of interest because most data atlases are obtained for different purposes and techniques. Identifying previously unseen cell types is essential for improving annotation accuracy and uncovering novel biological discoveries. To address this challenge, we propose mtANN (multiple-reference-based scRNA-seq data annotation), a new method to automatically annotate query data while accurately identifying unseen cell types with the aid of multiple references. Key innovations of mtANN include the integration of deep learning and ensemble learning to improve prediction accuracy, and the introduction of a new metric that considers three complementary aspects to distinguish between unseen cell types and shared cell types. Additionally, we provide a data-driven method to adaptively select a threshold for identifying previously unseen cell types. We demonstrate the advantages of mtANN over state-of-the-art methods for unseen cell-type identification and cell-type annotation on two benchmark dataset collections, as well as its predictive power on a collection of COVID-19 datasets. The source code and tutorial are available at https://github.com/Zhangxf-ccnu/mtANN.
Assuntos
Análise de Sequência de RNA , Análise de Célula Única , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Humanos , COVID-19/diagnóstico , SoftwareRESUMO
ZFP57 is a master regulator of genomic imprinting. It has both maternal and zygotic functions that are partially redundant in maintaining DNA methylation at some imprinting control regions (ICRs). In this study, we found that DNA methylation was lost at most known ICRs in Zfp57 mutant embryos. Furthermore, loss of ZFP57 caused loss of parent-of-origin-dependent monoallelic expression of the target imprinted genes. The allelic expression switch occurred in the ZFP57 target imprinted genes upon loss of differential DNA methylation at the ICRs in Zfp57 mutant embryos. Specifically, upon loss of ZFP57, the alleles of the imprinted genes located on the same chromosome with the originally methylated ICR switched their expression to mimic their counterparts on the other chromosome with unmethylated ICR. Consistent with our previous study, ZFP57 could regulate the NOTCH signaling pathway in mouse embryos by impacting allelic expression of a few regulators in the NOTCH pathway. In addition, the imprinted Dlk1 gene that has been implicated in the NOTCH pathway was significantly down-regulated in Zfp57 mutant embryos. Our allelic expression switch models apply to the examined target imprinted genes controlled by either maternally or paternally methylated ICRs. Our results support the view that ZFP57 controls imprinted expression of its target imprinted genes primarily through maintaining differential DNA methylation at the ICRs.
Assuntos
Alelos , Impressão Genômica , Proteínas Repressoras/genética , Animais , Metilação de DNA/genética , Embrião de Mamíferos/metabolismo , Feminino , Camundongos , RNA-Seq , Receptores Notch/metabolismo , Proteínas Repressoras/metabolismo , Transdução de Sinais/genéticaRESUMO
To explore the potential network markers and related signaling pathways of human B cells infected by COVID-19, we performed standardized integration and analysis of single-cell sequencing data to construct conditional cell-specific networks (CCSN) for each cell. Then the peripheral blood cells were clustered and annotated based on the conditional network degree matrix (CNDM) and gene expression matrix (GEM), respectively, and B cells were selected for further analysis. Besides, based on the CNDM of B cells, the hub genes and 'dark' genes (a gene has a significant difference between case and control samples not in a gene expression level but in a conditional network degree level) closely related to COVID-19 were revealed. Interestingly, some of the 'dark' genes and differential degree genes (DDGs) encoded key proteins in the JAK-STAT pathway, which had antiviral effects. The protein p21 encoded by the 'dark' gene CDKN1A was a key regulator for the COVID-19 infection-related signaling pathway. Elevated levels of proteins encoded by some DDGs were directly related to disease severity of patients with COVID-19. In short, the proteins encoded by 'dark' genes complement some missing links in COVID-19 and these signaling pathways played an important role in the growth and activation of B cells.
Assuntos
COVID-19 , Transdução de Sinais , Humanos , Transdução de Sinais/genética , Janus Quinases/genética , Fatores de Transcrição STAT/genética , COVID-19/genética , Redes Reguladoras de Genes , Perfilação da Expressão GênicaRESUMO
Simultaneous profiling transcriptomic and chromatin accessibility information in the same individual cells offers an unprecedented resolution to understand cell states. However, computationally effective methods for the integration of these inherent sparse and heterogeneous data are lacking. Here, we present a single-cell multimodal variational autoencoder model, which combines three types of joint-learning strategies with a probabilistic Gaussian Mixture Model to learn the joint latent features that accurately represent these multilayer profiles. Studies on both simulated datasets and real datasets demonstrate that it has more preferable capability (i) dissecting cellular heterogeneity in the joint-learning space, (ii) denoising and imputing data and (iii) constructing the association between multilayer omics data, which can be used for understanding transcriptional regulatory mechanisms.
Assuntos
Cromatina/metabolismo , Bases de Dados Factuais , Aprendizado Profundo , Modelos Biológicos , Análise de Célula Única , Transcriptoma , Cromatina/genética , Humanos , Células K562RESUMO
A single-sample network (SSN) is a biological molecular network constructed from single-sample data given a reference dataset and can provide insights into the mechanisms of individual diseases and aid in the development of personalized medicine. In this study, we proposed a computational method, a partial correlation-based single-sample network (P-SSN), which not only infers a network from each single-sample data given a reference dataset but also retains the direct interactions by excluding indirect interactions (https://github.com/hyhRise/P-SSN). By applying P-SSN to analyze tumor data from the Cancer Genome Atlas and single cell data, we validated the effectiveness of P-SSN in predicting driver mutation genes (DMGs), producing network distance, identifying subtypes and further classifying single cells. In particular, P-SSN is highly effective in predicting DMGs based on single-sample data. P-SSN is also efficient for subtyping complex diseases and for clustering single cells by introducing network distance between any two samples.