RESUMEN
Dynamic compartmentalization of eukaryotic DNA into active and repressed states enables diverse transcriptional programs to arise from a single genetic blueprint, whereas its dysregulation can be strongly linked to a broad spectrum of diseases. While single-cell Hi-C experiments allow for chromosome conformation profiling across many cells, they are still expensive and not widely available for most labs. Here, we propose an alternate approach, scENCORE, to computationally reconstruct chromatin compartments from the more affordable and widely accessible single-cell epigenetic data. First, scENCORE constructs a long-range epigenetic correlation graph to mimic chromatin interaction frequencies, where nodes and edges represent genome bins and their correlations. Then, it learns the node embeddings to cluster genome regions into A/B compartments and aligns different graphs to quantify chromatin conformation changes across conditions. Benchmarking using cell-type-matched Hi-C experiments demonstrates that scENCORE can robustly reconstruct A/B compartments in a cell-type-specific manner. Furthermore, our chromatin confirmation switching studies highlight substantial compartment-switching events that may introduce substantial regulatory and transcriptional changes in psychiatric disease. In summary, scENCORE allows accurate and cost-effective A/B compartment reconstruction to delineate higher-order chromatin structure heterogeneity in complex tissues.
Asunto(s)
Cromatina , Cromosomas , Cromatina/genética , ADN , Conformación Molecular , Epigénesis GenéticaRESUMEN
MOTIVATION: Recent advances in spatial transcriptomics allow spatially resolved gene expression measurements with cellular or even sub-cellular resolution, directly characterizing the complex spatiotemporal gene expression landscape and cell-to-cell interactions in their native microenvironments. Due to technology limitations, most spatial transcriptomic technologies still yield incomplete expression measurements with excessive missing values. Therefore, gene imputation is critical to filling in missing data, enhancing resolution, and improving overall interpretability. However, existing methods either require additional matched single-cell RNA-seq data, which is rarely available, or ignore spatial proximity or expression similarity information. RESULTS: To address these issues, we introduce Impeller, a path-based heterogeneous graph learning method for spatial transcriptomic data imputation. Impeller has two unique characteristics distinct from existing approaches. First, it builds a heterogeneous graph with two types of edges representing spatial proximity and expression similarity. Therefore, Impeller can simultaneously model smooth gene expression changes across spatial dimensions and capture similar gene expression signatures of faraway cells from the same type. Moreover, Impeller incorporates both short- and long-range cell-to-cell interactions (e.g. via paracrine and endocrine) by stacking multiple GNN layers. We use a learnable path operator in Impeller to avoid the over-smoothing issue of the traditional Laplacian matrices. Extensive experiments on diverse datasets from three popular platforms and two species demonstrate the superiority of Impeller over various state-of-the-art imputation methods. AVAILABILITY AND IMPLEMENTATION: The code and preprocessed data used in this study are available at https://github.com/aicb-ZhangLabs/Impeller and https://zenodo.org/records/11212604.
Asunto(s)
Transcriptoma , Transcriptoma/genética , Algoritmos , Perfilación de la Expresión Génica/métodos , Humanos , Programas Informáticos , Biología Computacional/métodos , Aprendizaje Automático , Análisis de la Célula Individual/métodosRESUMEN
BACKGROUND: Alzheimer's disease (AD) is a devastating neurodegenerative disorder affecting 44 million people worldwide, leading to cognitive decline, memory loss, and significant impairment in daily functioning. The recent single-cell sequencing technology has revolutionized genetic and genomic resolution by enabling scientists to explore the diversity of gene expression patterns at the finest resolution. Most existing studies have solely focused on molecular perturbations within each cell, but cells live in microenvironments rather than in isolated entities. Here, we leveraged the large-scale and publicly available single-nucleus RNA sequencing in the human prefrontal cortex to investigate cell-to-cell communication in healthy brains and their perturbations in AD. We uniformly processed the snRNA-seq with strict QCs and labeled canonical cell types consistent with the definitions from the BRAIN Initiative Cell Census Network. From ligand and receptor gene expression, we built a high-confidence cell-to-cell communication network to investigate signaling differences between AD and healthy brains. RESULTS: Specifically, we first performed broad communication pattern analyses to highlight that biologically related cell types in normal brains rely on largely overlapping signaling networks and that the AD brain exhibits the irregular inter-mixing of cell types and signaling pathways. Secondly, we performed a more focused cell-type-centric analysis and found that excitatory neurons in AD have significantly increased their communications to inhibitory neurons, while inhibitory neurons and other non-neuronal cells globally decreased theirs to all cells. Then, we delved deeper with a signaling-centric view, showing that canonical signaling pathways CSF, TGFß, and CX3C are significantly dysregulated in their signaling to the cell type microglia/PVM and from endothelial to neuronal cells for the WNT pathway. Finally, after extracting 23 known AD risk genes, our intracellular communication analysis revealed a strong connection of extracellular ligand genes APP, APOE, and PSEN1 to intracellular AD risk genes TREM2, ABCA1, and APP in the communication from astrocytes and microglia to neurons. CONCLUSIONS: In summary, with the novel advances in single-cell sequencing technologies, we show that cellular signaling is regulated in a cell-type-specific manner and that improper regulation of extracellular signaling genes is linked to intracellular risk genes, giving the mechanistic intra- and inter-cellular picture of AD.
Asunto(s)
Enfermedad de Alzheimer , Comunicación Celular , Análisis de la Célula Individual , Transcriptoma , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Enfermedad de Alzheimer/patología , Humanos , Comunicación Celular/fisiología , Análisis de la Célula Individual/métodos , Encéfalo/metabolismo , Encéfalo/patología , Corteza Prefrontal/metabolismo , Neuronas/metabolismo , Transducción de Señal/fisiología , Transducción de Señal/genéticaRESUMEN
Different genes form complex networks within cells to carry out critical cellular functions, while network alterations in this process can potentially introduce downstream transcriptome perturbations and phenotypic variations. Therefore, developing efficient and interpretable methods to quantify network changes and pinpoint driver genes across conditions is crucial. We propose a hierarchical graph representation learning method, called iHerd. Given a set of networks, iHerd first hierarchically generates a series of coarsened sub-graphs in a data-driven manner, representing network modules at different resolutions (e.g., the level of signaling pathways). Then, it sequentially learns low-dimensional node representations at all hierarchical levels via efficient graph embedding. Lastly, iHerd projects separate gene embeddings onto the same latent space in its graph alignment module to calculate a rewiring index for driver gene prioritization. To demonstrate its effectiveness, we applied iHerd on a tumor-to-normal GRN rewiring analysis and cell-type-specific GCN analysis using single-cell multiome data of the brain. We showed that iHerd can effectively pinpoint novel and well-known risk genes in different diseases. Distinct from existing models, iHerd's graph coarsening for hierarchical learning allows us to successfully classify network driver genes into early and late divergent genes (EDGs and LDGs), emphasizing genes with extensive network changes across and within signaling pathway levels. This unique approach for driver gene classification can provide us with deeper molecular insights. The code is freely available at https://github.com/aicb-ZhangLabs/iHerd. All other relevant data are within the manuscript and supporting information files.
Asunto(s)
Aprendizaje Profundo , Encéfalo , Aprendizaje , RegistrosRESUMEN
Since Teleostei fins have a strong regenerative capacity, further research was conducted on the regulation of gene expression during fin regeneration. This research focuses on miRNA, which is a key post-transcriptional regulatory molecule. In this study, a miRNA library for the fin regeneration of zebrafish was constructed to reveal the differential expression of miRNA during fin regeneration and to explore the regulatory pathway for fin regeneration. Following the injection of miRNA agomir into zebrafish, the proliferation of blastema cells and the overall fin regeneration area were significantly reduced. It was observed that the miRNAs impaired blastocyte formation by affecting fin regeneration through the inhibition of the expressions of genes and proteins associated with blastocyte formation (including yap1 and Smad1/5/9), which is an effect associated with the Hippo pathway. Furthermore, it has been demonstrated that miRNAs can impair the patterns and mineralization of newly formed fin rays. The miRNAs influenced fin regeneration by inhibiting the expression of a range of bone-related genes and proteins in osteoblast lineages, including sp7, runx2a, and runx2b. This study provides a valuable reference for the further exploration of morphological bone reconstruction in aquatic vertebrates.
Asunto(s)
Aletas de Animales , MicroARNs , Regeneración , Proteínas de Pez Cebra , Pez Cebra , Animales , MicroARNs/genética , Pez Cebra/genética , Aletas de Animales/fisiología , Aletas de Animales/metabolismo , Regeneración/genética , Proteínas de Pez Cebra/genética , Proteínas de Pez Cebra/metabolismo , Proliferación Celular/genética , Regulación de la Expresión Génica , Proteínas Señalizadoras YAP/metabolismo , Proteínas Señalizadoras YAP/genética , Transducción de Señal , Osteoblastos/metabolismo , Factor de Transcripción Sp7RESUMEN
Early and accurate detection of viruses in clinical and environmental samples is essential for effective public healthcare, treatment, and therapeutics. While PCR detects potential pathogens with high sensitivity, it is difficult to scale and requires knowledge of the exact sequence of the pathogen. With the advent of next-gen single-cell sequencing, it is now possible to scrutinize viral transcriptomics at the finest possible resolution-cells. This newfound ability to investigate individual cells opens new avenues to understand viral pathophysiology with unprecedented resolution. To leverage this ability, we propose an efficient and accurate computational pipeline, named Venus, for virus detection and integration site discovery in both single-cell and bulk-tissue RNA-seq data. Specifically, Venus addresses two main questions: whether a tissue/cell type is infected by viruses or a virus of interest? And if infected, whether and where has the virus inserted itself into the human genome? Our analysis can be broken into two parts-validation and discovery. Firstly, for validation, we applied Venus on well-studied viral datasets, such as HBV- hepatocellular carcinoma and HIV-infection treated with antiretroviral therapy. Secondly, for discovery, we analyzed datasets such as HIV-infected neurological patients and deeply sequenced T-cells. We detected viral transcripts in the novel target of the brain and high-confidence integration sites in immune cells. In conclusion, here we describe Venus, a publicly available software which we believe will be a valuable virus investigation tool for the scientific community at large.
Asunto(s)
Infecciones por VIH , Neoplasias Hepáticas , Virus , Humanos , RNA-Seq , Análisis de Secuencia de ARN , Programas InformáticosRESUMEN
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type-specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
Asunto(s)
Encéfalo , Redes Reguladoras de Genes , Trastornos Mentales , Análisis de la Célula Individual , Humanos , Envejecimiento/genética , Encéfalo/metabolismo , Comunicación Celular/genética , Cromatina/metabolismo , Cromatina/genética , Genómica , Trastornos Mentales/genética , Corteza Prefrontal/metabolismo , Corteza Prefrontal/fisiología , Sitios de Carácter CuantitativoRESUMEN
Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet, little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multi-omics datasets into a resource comprising >2.8M nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550K cell-type-specific regulatory elements and >1.4M single-cell expression-quantitative-trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.
RESUMEN
Motivation: Recent initiatives for federal grant transparency allow direct knowledge extraction from large volumes of grant texts, serving as a powerful alternative to traditional surveys. However, its computational modeling is challenging as grants are usually multifaceted with constantly evolving topics. Results: We propose Turtling, a time-aware neural topic model with three unique characteristics. First, Turtling employs pretrained biomedical word embedding to extract research topics. Second, it leverages a probabilistic time-series model to allow smooth and coherent topic evolution. Lastly, Turtling leverages additional topic diversity loss and funding institute classification loss to improve topic quality and facilitate funding institute prediction. We apply Turtling on publicly available NIH grant text and show that it significantly outperforms other methods on topic quality metrics. We also demonstrate that Turtling can provide insights into research topic evolution by detecting topic trends across decades. In summary, Turtling may be a valuable tool for grant text analysis. Availability and implementation: Turtling is freely available as an open-source software at https://github.com/aicb-ZhangLabs/Turtling.
RESUMEN
In recent years, multivariate time-series classification (MTSC) has attracted considerable attention owing to the advancement of sensing technology. Existing deep-learning-based MTSC techniques, which mostly rely on convolutional or recurrent neural networks, focus primarily on the temporal dependency of a single time series. Based on this, complex pairwise dependencies among multivariate variables can be better described using advanced graph methods, where each variable is regarded as a node in the graph, and their dependencies are regarded as edges. Furthermore, current spatial-temporal modeling (e.g., graph classification) methodologies based on graph neural networks (GNNs) are inherently flat and cannot hierarchically aggregate node information. To address these limitations, we propose a novel graph-pooling-based framework, MTPool, to obtain an expressive global representation of MTS. We first convert MTS slices into graphs using the interactions of variables via a graph structure learning module and obtain the spatial-temporal graph node features via a temporal convolutional module. To obtain global graph-level representation, we design an "encoder-decoder"-based variational graph pooling module to create adaptive centroids for cluster assignments. Then, we combine GNNs and our proposed variational graph pooling layers for joint graph representation learning and graph coarsening, after which the graph is progressively coarsened to one node. Finally, a differentiable classifier uses this coarsened representation to obtain the final predicted class. Experiments on ten benchmark datasets showed that MTPool outperforms state-of-the-art strategies in the MTSC task.
Asunto(s)
Redes Neurales de la Computación , Factores de TiempoRESUMEN
The dry mycelium fertilizer (DMF) was produced from penicillin fermentation fungi mycelium (PFFM) following an acid-heating pretreatment to degrade the residual penicillin. In this study, it was applied into soil as fertilizer to investigate its effects on soil properties, phytotoxicity, microbial community composition, enzyme activities, and growth of snap bean in greenhouse. As the results show, pH, total nitrogen, total phosphorus, total potassium, and organic matter of soil with DMF treatments were generally higher than CON treatment. In addition, the applied DMF did not cause heavy metal and residual drug pollution of the modified soil. The lowest GI values (<0.3) were recorded at DMF8 (36 kg DMF/plat) on the first days after applying the fertilizer, indicating that severe phytotoxicity appeared in the DMF8-modified soil. Results of microbial population and enzyme activities illustrated that DMF was rapidly decomposed and the decomposition process significantly affected microbial growth and enzyme activities. The DMF-modified soil phytotoxicity decreased at the late fertilization time. DMF1 was considered as the optimum amount of DMF dose based on principal component analysis scores. Plant height and plant yield of snap bean were remarkably enhanced with the optimum DMF dose.