ABSTRACT
Multi-omics data allow us to select a small set of informative markers for the discrimination of specific cell types and study of cellular heterogeneity. However, it is often challenging to choose an optimal marker panel from the high-dimensional molecular profiles for a large amount of cell types. Here, we propose a method called Mixed Integer programming Model to Identify Cell type-specific marker panel (MIMIC). MIMIC maintains the hierarchical topology among different cell types and simultaneously maximizes the specificity of a fixed number of selected markers. MIMIC was benchmarked on the mouse ENCODE RNA-seq dataset, with 29 diverse tissues, for 43 surface markers (SMs) and 1345 transcription factors (TFs). MIMIC could select biologically meaningful markers and is robust for different accuracy criteria. It shows advantages over the standard single gene-based approaches and widely used dimensional reduction methods, such as multidimensional scaling and t-SNE, both in accuracy and in biological interpretation. Furthermore, the combination of SMs and TFs achieves better specificity than SMs or TFs alone. Applying MIMIC to a large collection of 641 RNA-seq samples covering 231 cell types identifies a panel of TFs and SMs that reveal the modularity of cell type association networks. Finally, the scalability of MIMIC is demonstrated by selecting enhancer markers from mouse ENCODE data. MIMIC is freely available at https://github.com/MengZou1/MIMIC.
Subject(s)
Biomarkers , Computational Biology , Flow Cytometry/methods , Gene Expression Profiling/methods , Organ Specificity , Software , Algorithms , Computational Biology/methods , Databases, Genetic , Gene Expression Regulation , Humans , Organ Specificity/genetics , Reproducibility of ResultsABSTRACT
Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.
ABSTRACT
Trace metals play a vital role in a variety of biological processes, but excessive amounts can be toxic and are receiving increasing attention. Trace metals in the environment are released from natural sources, such as rock weathering, volcanic eruptions, and other human activities, such as industrial emissions, mineral extraction, and vehicle exhaust. Lifestyle, dietary habits and environmental quality are the main sources of human exposure to trace metals, which play an important role in inducing human reproductive infertility. The purpose of this review is to summarize the distribution of various trace metals in oocyte and to identify the trace metals that may cause oocyte used in the design and execution of toxicological studies.
Subject(s)
Oocytes , Trace Elements , Humans , Oocytes/drug effects , Trace Elements/analysis , Trace Elements/adverse effects , Female , Environmental Exposure/adverse effects , Metals, Heavy/analysis , Metals/adverse effects , Metals/analysisABSTRACT
Despite considerable efforts to identify human liver cancer genomic alterations that might unveil druggable targets, the systematic translation of multiomics data remains challenging. Here, we report success in long-term culture of 64 patient-derived hepatobiliary tumor organoids (PDHOs) from a Chinese population. A divergent response to 265 metabolism- and epigenetics-related chemicals and 36 anti-cancer drugs is observed. Integration of the whole genome, transcriptome, chromatin accessibility profiles, and drug sensitivity results of 64 clinically relevant drugs defines over 32,000 genome-drug interactions. RUNX1 promoter mutation is associated with an increase in chromatin accessibility and a concomitant gene expression increase, promoting a cluster of drugs preferentially sensitive in hepatobiliary tumors. These results not only provide an annotated PDHO biobank of human liver cancer but also suggest a systematic approach for obtaining a comprehensive understanding of the gene-regulatory network of liver cancer, advancing the applications of potential personalized medicine.
Subject(s)
Antineoplastic Agents , Liver Neoplasms , Humans , Pharmacogenetics , Antineoplastic Agents/pharmacology , Antineoplastic Agents/metabolism , Liver Neoplasms/drug therapy , Liver Neoplasms/genetics , Organoids/pathology , Chromatin/metabolismABSTRACT
Accurate context-specific Gene Regulatory Networks (GRNs) inference from genomics data is a crucial task in computational biology. However, existing methods face limitations, such as reliance on gene expression data alone, lower resolution from bulk data, and data scarcity for specific cellular systems. Despite recent technological advancements, including single-cell sequencing and the integration of ATAC-seq and RNA-seq data, learning such complex mechanisms from limited independent data points still presents a daunting challenge, impeding GRN inference accuracy. To overcome this challenge, we present LINGER (LIfelong neural Network for GEne Regulation), a novel deep learning-based method to infer GRNs from single-cell multiome data with paired gene expression and chromatin accessibility data from the same cell. LINGER incorporates both 1) atlas-scale external bulk data across diverse cellular contexts and 2) the knowledge of transcription factor (TF) motif matching to cis-regulatory elements as a manifold regularization to address the challenge of limited data and extensive parameter space in GRN inference. Our results demonstrate that LINGER achieves 2-3 fold higher accuracy over existing methods. LINGER reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Additionally, following the GRN inference from a reference sc-multiome data, LINGER allows for the estimation of TF activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies. Overall, LINGER provides a comprehensive tool for robust gene regulation inference from genomics data, empowering deeper insights into cellular mechanisms.
ABSTRACT
Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells are generated. We propose a method named UnpairReg for the regression analysis on unpaired observations to integrate single-cell multi-omics data. On real and simulated data, UnpairReg provides an accurate estimation of cell gene expression where only chromatin accessibility data is available. The cis-regulatory network inferred from UnpairReg is highly consistent with eQTL mapping. UnpairReg improves cell type identification accuracy by joint analysis of single-cell gene expression and chromatin accessibility data.
Subject(s)
Chromatin , Genomics , Chromatin/genetics , Regression Analysis , Single-Cell AnalysisABSTRACT
Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with genome-wide association studies (GWAS) summary statistics, identify relevant tissues, and estimate relevance correlation to depict common genetic factors acting in the shared regulatory networks between traits. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP-associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes' relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar, copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829.
Subject(s)
Gene Regulatory Networks , Genome-Wide Association Study , Phenotype , Gene Expression Regulation , Multifactorial Inheritance/genetics , Polymorphism, Single NucleotideABSTRACT
Chromatin accessibility plays an essential role in controlling cellular identity and the therapeutic response of human cancers. However, the chromatin accessibility landscape and gene regulatory network of pancreatic cancer are largely uncharacterized. Here, we integrate the chromatin accessibility profiles of 84 pancreatic cancer organoid lines with whole-genome sequencing data, transcriptomic sequencing data and the results of drug sensitivity analysis of 283 epigenetic-related chemicals and 5 chemotherapeutic drugs. We identify distinct transcription factors that distinguish molecular subtypes of pancreatic cancer, predict numerous chromatin accessibility peaks associated with gene regulatory networks, discover regulatory noncoding mutations with potential as cancer drivers, and reveal the chromatin accessibility signatures associated with drug sensitivity. These results not only provide the chromatin accessibility atlas of pancreatic cancer but also suggest a systematic approach to comprehensively understand the gene regulatory network of pancreatic cancer in order to advance diagnosis and potential personalized medicine applications.
Subject(s)
Chromatin , Pancreatic Neoplasms , Chromatin/genetics , Gene Regulatory Networks , Humans , Organoids , Pancreatic Neoplasms/drug therapy , Pancreatic Neoplasms/genetics , Transcriptome , Pancreatic NeoplasmsABSTRACT
Heart failure with preserved ejection fraction (HFpEF) is a complex disease characterized by dysfunctions in the heart, adipose tissue, and cerebral arteries. The elucidation of the interactions between these three tissues in HFpEF will improve our understanding of the mechanism of HFpEF. In this study, we propose a multilevel comparative framework based on differentially expressed genes (DEGs) and differentially correlated gene pairs (DCGs) to investigate the shared and unique pathological features among the three tissues in HFpEF. At the network level, functional enrichment analysis revealed that the networks of the heart, adipose tissue, and cerebral arteries were enriched in the cell cycle and immune response. The networks of the heart and adipose tissues were enriched in hemostasis, G-protein coupled receptor (GPCR) ligand, and cancer-related pathway. The heart-specific networks were enriched in the inflammatory response and cardiac hypertrophy, while the adipose-tissue-specific networks were enriched in the response to peptides and regulation of cell adhesion. The cerebral-artery-specific networks were enriched in gene expression (transcription). At the module and gene levels, 5 housekeeping DEGs, 2 housekeeping DCGs, 6 modules of merged protein-protein interaction network, 5 tissue-specific hub genes, and 20 shared hub genes were identified through comparative analysis of tissue pairs. Furthermore, the therapeutic drugs for HFpEF-targeting these genes were examined using molecular docking. The combination of multitissue and multilevel comparative frameworks is a potential strategy for the discovery of effective therapy and personalized medicine for HFpEF.
ABSTRACT
Transcription factors (TFs) define cellular identity either by activating target cell program or by silencing donor program as demonstrated by intensive cell reprogramming studies. Here, we propose an extended minimum set cover model with stable selection (3Scover) to systematically identify silencing TFs, named safeguard TFs, from omics data. First, a cell type-TF specificity network is constructed to systematically link cell types with their specifically expressed TFs. Then we search the minimum TF set to cover this network with "many but one specificity" characteristic and integrate many subsampling models for a stable solution. 3Scover identified 30 safeguard TFs in human and mouse. These safeguard TFs are significantly enriched in the experimentally discovered reprogramming panel with their protein-protein interactors. In addition, they tend to interact closely with chromatin regulators, negatively regulate transcription, and function earlier in development. Collectively, 3Scover allows us to probe master TFs and combinatorial regulation in controlling cell identity.
ABSTRACT
Polycystic ovary syndrome (PCOS) is a common reproductive endocrine disease characterized by persistent anovulation and hyperandrogenism, affecting approximately 8-10% of women of childbearing age and occupying an important position in the etiology of infertility. There is increasing evidence that long non-coding RNAs (lncRNAs) are involved in the development of PCOS, but the potential regulatory mechanism is still unclear. This study performed high-throughput lncRNA sequencing of follicular fluid exosomes in non-PCOS infertility patients and PCOS infertility patients. The sequencing results led to the identification of 1,253 upregulated and 613 downregulated lncRNAs from a total of 1,866 detected candidates. There was no significant difference between the PCOS patients and non-PCOS patients in body mass index (BMI) or the fasting blood glucose (FBG) level. However, luteinizing hormone (LH), estradiol (E2), testosterone (T), serum prolactin (PRL), and anti-Mullerian hormone (AMH) levels were clearly upregulated in PCOS patients compared to those in non-PCOS patients. There was also an increase in LH/FSH (>2) in the PCOS patients. Functional analysis showed pathways related to endocytosis, the Hippo, the MAPK, and HTLV-1 infection. These results suggest that lncRNAs may play an important role in the pathogenesis of PCOS and may be potential targets for the diagnosis and treatment of PCOS.
ABSTRACT
High-altitude adaptation of Tibetans represents a remarkable case of natural selection during recent human evolution. Previous genome-wide scans found many non-coding variants under selection, suggesting a pressing need to understand the functional role of non-coding regulatory elements (REs). Here, we generate time courses of paired ATAC-seq and RNA-seq data on cultured HUVECs under hypoxic and normoxic conditions. We further develop a variant interpretation methodology (vPECA) to identify active selected REs (ASREs) and associated regulatory network. We discover three causal SNPs of EPAS1, the key adaptive gene for Tibetans. These SNPs decrease the accessibility of ASREs with weakened binding strength of relevant TFs, and cooperatively down-regulate EPAS1 expression. We further construct the downstream network of EPAS1, elucidating its roles in hypoxic response and angiogenesis. Collectively, we provide a systematic approach to interpret phenotype-associated noncoding variants in proper cell types and relevant dynamic conditions, to model their impact on gene regulation.
Subject(s)
Acclimatization/genetics , Chromatin/metabolism , Ethnicity/genetics , Gene Regulatory Networks , Models, Genetic , Altitude , Altitude Sickness/ethnology , Altitude Sickness/genetics , Altitude Sickness/metabolism , Basic Helix-Loop-Helix Transcription Factors/genetics , Cell Hypoxia/genetics , Cells, Cultured , Chromatin/genetics , Chromatin Immunoprecipitation Sequencing , Disease Resistance/genetics , Female , Gene Expression Regulation , Human Umbilical Vein Endothelial Cells , Humans , Hypoxia/genetics , Hypoxia/metabolism , Oxygen/metabolism , Polymorphism, Single Nucleotide , Pregnancy , Primary Cell Culture , RNA-Seq , Regulatory Elements, Transcriptional/genetics , Selection, Genetic , Tibet/ethnology , Transcription Factors/metabolism , Whole Genome SequencingABSTRACT
Although cancer is commonly perceived as a disease of dedifferentiation, the hallmark of early-stage prostate cancer is paradoxically the loss of more plastic basal cells and the abnormal proliferation of more differentiated secretory luminal cells. However, the mechanism of prostate cancer proluminal differentiation is largely unknown. Through integrating analysis of the transcription factors (TFs) from 806 human prostate cancers, we found that ERG was highly correlated with prostate cancer luminal subtyping. ERG overexpression in luminal epithelial cells inhibited those cells' normal plasticity to transdifferentiate into a basal lineage, and ERG superseded PTEN loss, which favored basal differentiation. ERG KO disrupted prostate cell luminal differentiation, whereas AR KO had no such effects. Trp63 is a known master regulator of the prostate basal lineage. Through analysis of 3D chromatin architecture, we found that ERG bound and inhibited the enhancer activity and chromatin looping of a Trp63 distal enhancer, thereby silencing its gene expression. Specific deletion of the distal ERG binding site resulted in the loss of ERG-mediated inhibition of basal differentiation. Thus, ERG, in its fundamental role in lineage differentiation in prostate cancer initiation, orchestrated chromatin interactions and regulated prostate cell lineage toward a proluminal program.
Subject(s)
Cellular Reprogramming , Epithelial Cells/metabolism , Oncogene Proteins/metabolism , Prostatic Neoplasms/metabolism , Transcriptional Regulator ERG/metabolism , Animals , Epithelial Cells/pathology , Gene Knockout Techniques , Male , Mice , Mice, Transgenic , Oncogene Proteins/genetics , PTEN Phosphohydrolase/genetics , PTEN Phosphohydrolase/metabolism , Prostatic Neoplasms/genetics , Prostatic Neoplasms/pathology , Receptors, Androgen/genetics , Receptors, Androgen/metabolism , Transcriptional Regulator ERG/geneticsABSTRACT
The enantiomeric separation of unmodified D,L-isoleucine was achieved in citric acid-zinc(â ¡) medium by capillary electrophoresis (CE) with contactless conductivity detector (C4D). In the conventional chiral separation methods of amino acid, a chiral complex used as the chiral selector was added into the eluent in order to yield a chiral environment. However, in this study a non-chiral solution, i. e. 2.8 mmol/L NaOH+0.8 mmol/L citric acid+2.0 mmol/L zinc acetate was used as the running buffer, and the citric acid-zinc(â ¡) acted the role of a chiral selector. Under the optimum experimental conditions:uncoated fused-silica capillary (45 cm×50 µm, Leff=40 cm), separation voltage of +13 kV, electrokinetic injection of 11 kV×8 s, the enantiomers of D,L-isoleucine were baseline separated within 8 min with the resolutions (Rs) of 2.0. The calibration curve of each enantiomer showed good linearity in the range from 1.0 mg/L to 20 mg/L, with the limits of detection of 0.40 mg/L. The intra-and inter-day precisions were examined. The RSDs of peak area and migration time were found to be below 5.0% and 2.5% (n=6), respectively, indicating good repeatability (intra-day) and reproducibility (inter-day) of the method. Interference experiment was also tested. As a result, other common amino acids did not interfere with the detection. The proposed method provided a potential new way to further investigate the enantioseparation of unmodified or native amino acids.
Subject(s)
Citric Acid , Electrophoresis, Capillary , Isoleucine/isolation & purification , Zinc , Buffers , Reproducibility of Results , StereoisomerismABSTRACT
OBJECTIVE: The aim of this study was to establish the relationship between miR-124-3p and Aurora A kinase (AURKA) in bladder cancer (BC). METHODS: The expressions of miR-124-3p and AURKA in BC tissues and cell lines were detected using RT-PCR and western blot. BC cells were transfected with miR-124-3p mimics and AURKA siRNA. After this cell proliferation, migration, cell cycle and apoptosis were measured using CCK-8, colony formation assay, wound healing assay and cytometry tests. The relationship between miR-124-3p and AURKA was confirmed with luciferase reporter assay. Mice xenograft models were constructed to examine the effects of AURKA on BC in vivo. RESULTS: MiR-124-3p expression was significantly down-regulated in BC tissues and cell lines, while AURKA was significantly up-regulated compared to normal samples. MiR-124-3p targeted AURKA and decreased its expression. Transfection of miR-124-3p mimics and AURKA siRNA was shown to down-regulate BC cell proliferation and migration as well as induce cell apoptosis. As suggested by xenograft models, the inhibition of AURKA can effectively suppress tumor growth. CONCLUSION: MiR-124-3p has significant impact on proliferation, migration and apoptosis of BC cells by targeting AURKA.