RESUMO
The emergence of single-cell RNA sequencing (scRNA-seq) technology has revolutionized the identification of cell types and the study of cellular states at a single-cell level. Despite its significant potential, scRNA-seq data analysis is plagued by the issue of missing values. Many existing imputation methods rely on simplistic data distribution assumptions while ignoring the intrinsic gene expression distribution specific to cells. This work presents a novel deep-learning model, named scMultiGAN, for scRNA-seq imputation, which utilizes multiple collaborative generative adversarial networks (GAN). Unlike traditional GAN-based imputation methods that generate missing values based on random noises, scMultiGAN employs a two-stage training process and utilizes multiple GANs to achieve cell-specific imputation. Experimental results show the efficacy of scMultiGAN in imputation accuracy, cell clustering, differential gene expression analysis and trajectory analysis, significantly outperforming existing state-of-the-art techniques. Additionally, scMultiGAN is scalable to large scRNA-seq datasets and consistently performs well across sequencing platforms. The scMultiGAN code is freely available at https://github.com/Galaxy8172/scMultiGAN.
Assuntos
Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Análise por Conglomerados , Sequenciamento do Exoma , Análise de Dados , Análise de Sequência de RNA , Perfilação da Expressão GênicaRESUMO
The innovation of biotechnologies has allowed the accumulation of omics data at an alarming rate, thus introducing the era of 'big data'. Extracting inherent valuable knowledge from various omics data remains a daunting problem in bioinformatics. Better solutions often need some kind of more innovative methods for efficient handlings and effective results. Recent advancements in integrated analysis and computational modeling of multi-omics data helped address such needs in an increasingly harmonious manner. The development and application of machine learning have largely advanced our insights into biology and biomedicine and greatly promoted the development of therapeutic strategies, especially for precision medicine. Here, we propose a comprehensive survey and discussion on what happened, is happening and will happen when machine learning meets omics. Specifically, we describe how artificial intelligence can be applied to omics studies and review recent advancements at the interface between machine learning and the ever-widest range of omics including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well as those at the single-cell resolution. We also discuss and provide a synthesis of ideas, new insights, current challenges and perspectives of machine learning in omics.
Assuntos
Inteligência Artificial , Aprendizado de Máquina , Biologia Computacional/métodos , Genômica/métodos , Proteômica/métodosRESUMO
OBJECTIVE: To analyse the global burden, trends and cross-country inequalities of female breast and gynaecologic cancers (FeBGCs). DESIGN: Population-Based Study. SETTING: Data sourced from the Global Burden of Disease Study 2019. POPULATION: Individuals diagnosed with FeBGCs. METHODS: Age-standardised mortality rates (ASMRs), age-standardised Disability-Adjusted Life Years (DALYs) rates (ASDRs) and their 95% uncertainty interval (UI) described the burden. Estimated annual percentage changes (EAPCs) and their confidence interval (CI) of age-standardised rates (ASRs) illustrated trends. Social inequalities were quantified using the Slope Index of Inequality (SII) and Concentration Index. MAIN OUTCOME MEASURES: The main outcome measures were the burden of FeBGCs and the trends in its inequalities over time. RESULTS: In 2019, the ASDRs per 100 000 females were as follows: breast cancer: 473.83 (95% UI: 437.30-510.51), cervical cancer: 210.64 (95% UI: 177.67-234.85), ovarian cancer: 124.68 (95% UI: 109.13-138.67) and uterine cancer: 210.64 (95% UI: 177.67-234.85). The trends per year from 1990 to 2019 were expressed as EAPCs of ASDRs and these: for Breast cancer: -0.51 (95% CI: -0.57 to -0.45); Cervical cancer: -0.95 (95% CI: -0.99 to -0.89); Ovarian cancer: -0.08 (95% CI: -0.12 to -0.04); and Uterine cancer: -0.84 (95% CI: -0.93 to -0.75). In the Social Inequalities Analysis (1990-2019) the SII changed from 689.26 to 607.08 for Breast, from -226.66 to -239.92 for cervical, from 222.45 to 228.83 for ovarian and from 74.61 to 103.58 for uterine cancer. The concentration index values ranged from 0.2 to 0.4. CONCLUSIONS: The burden of FeBGCs worldwide showed a downward trend from 1990 to 2019. Countries or regions with higher Socio-demographic Index (SDI) bear a higher DALYs burden of breast, ovarian and uterine cancers, while those with lower SDI bear a heavier burden of cervical cancer. These inequalities increased over time.
RESUMO
RNA sequencing (RNA-seq) is a powerful technology for studying human transcriptome variation. We introduce PAIRADISE (Paired Replicate Analysis of Allelic Differential Splicing Events), a method for detecting allele-specific alternative splicing (ASAS) from RNA-seq data. Unlike conventional approaches that detect ASAS events one sample at a time, PAIRADISE aggregates ASAS signals across multiple individuals in a population. By treating the two alleles of an individual as paired, and multiple individuals sharing a heterozygous SNP as replicates, we formulate ASAS detection using PAIRADISE as a statistical problem for identifying differential alternative splicing from RNA-seq data with paired replicates. PAIRADISE outperforms alternative statistical models in simulation studies. Applying PAIRADISE to replicate RNA-seq data of a single individual and to population-scale RNA-seq data across many individuals, we detect ASAS events associated with genome-wide association study (GWAS) signals of complex traits or diseases. Additionally, PAIRADISE ASAS analysis detects the effects of rare variants on alternative splicing. PAIRADISE provides a useful computational tool for elucidating the genetic variation and phenotypic association of alternative splicing in populations.
Assuntos
Processamento Alternativo/genética , Predisposição Genética para Doença , Herança Multifatorial/genética , Transcriptoma/genética , Alelos , Feminino , Perfilação da Expressão Gênica , Genética Populacional/métodos , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Modelos Estatísticos , RNA-Seq , Sequenciamento do ExomaRESUMO
Single-cell RNA-sequencing (scRNA-seq) enables the characterization of transcriptomic profiles at the single-cell resolution with increasingly high throughput. However, it suffers from many sources of technical noises, including insufficient mRNA molecules that lead to excess false zero values, termed dropouts. Computational approaches have been proposed to recover the biologically meaningful expression by borrowing information from similar cells in the observed dataset. However, these methods suffer from oversmoothing and removal of natural cell-to-cell stochasticity in gene expression. Here, we propose the generative adversarial networks (GANs) for scRNA-seq imputation (scIGANs), which uses generated cells rather than observed cells to avoid these limitations and balances the performance between major and rare cell populations. Evaluations based on a variety of simulated and real scRNA-seq datasets show that scIGANs is effective for dropout imputation and enhances various downstream analysis. ScIGANs is robust to small datasets that have very few genes with low expression and/or cell-to-cell variance. ScIGANs works equally well on datasets from different scRNA-seq protocols and is scalable to datasets with over 100 000 cells. We demonstrated in many ways with compelling evidence that scIGANs is not only an application of GANs in omics data but also represents a competing imputation method for the scRNA-seq data.
Assuntos
RNA-Seq/métodos , Análise de Célula Única/métodos , Software , Transcriptoma/genética , Biologia Computacional , RNA/genética , RNA Mensageiro/genética , Sequenciamento do Exoma/métodosRESUMO
Alternative splicing (AS) is a genetically and epigenetically regulated pre-mRNA processing to increase transcriptome and proteome diversity. Comprehensively decoding these regulatory mechanisms holds promise in getting deeper insights into a variety of biological contexts involving in AS, such as development and diseases. We assembled splicing (epi)genetic code, DeepCode, for human embryonic stem cell (hESC) differentiation by integrating heterogeneous features of genomic sequences, 16 histone modifications with a multi-label deep neural network. With the advantages of epigenetic features, DeepCode significantly improves the performance in predicting the splicing patterns and their changes during hESC differentiation. Meanwhile, DeepCode reveals the superiority of epigenomic features and their dominant roles in decoding AS patterns, highlighting the necessity of including the epigenetic properties when assembling a more comprehensive splicing code. Moreover, DeepCode allows the robust predictions across cell lineages and datasets. Especially, we identified a putative H3K36me3-regulated AS event leading to a nonsense-mediated mRNA decay of BARD1. Reduced BARD1 expression results in the attenuation of ATM/ATR signalling activities and further the hESC differentiation. These results suggest a novel candidate mechanism linking histone modifications to hESC fate decision. In addition, when trained in different contexts, DeepCode can be expanded to a variety of biological and biomedical fields.
Assuntos
Processamento Alternativo , Células-Tronco Embrionárias/metabolismo , Epigênese Genética , Código das Histonas , Aprendizado de Máquina , Redes Neurais de Computação , Diferenciação Celular/genética , Linhagem Celular , Linhagem da Célula , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de RNA , Proteínas Supressoras de Tumor/genética , Proteínas Supressoras de Tumor/metabolismo , Ubiquitina-Proteína Ligases/genética , Ubiquitina-Proteína Ligases/metabolismoRESUMO
Module identification is a frequently used approach for mining local structures with more significance in global networks. Recently, a wide variety of bilayer networks are emerging to characterize the more complex biological processes. In the light of special topological properties of bilayer networks and the accompanying challenges, there is yet no effective method aiming at bilayer module identification to probe the modular organizations from the more inspiring bilayer networks. To this end, we proposed the pseudo-3D clustering algorithm, which starts from extracting initial non-hierarchically organized modules and then iteratively deciphers the hierarchical organization of modules according to a bottom-up strategy. Specifically, a modularity function for bilayer modules was proposed to facilitate the algorithm reporting the optimal partition that gives the most accurate characterization of the bilayer network. Simulation studies demonstrated its robustness and outperformance against alternative competing methods. Specific applications to both the soybean and human miRNA-gene bilayer networks demonstrated that the pseudo-3D clustering algorithm successfully identified the overlapping, hierarchically organized and highly cohesive bilayer modules. The analyses on topology, functional and human disease enrichment and the bilayer subnetwork involved in soybean fat biosynthesis provided both the theoretical and biological evidence supporting the effectiveness and robustness of pseudo-3D clustering algorithm.
Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Redes Reguladoras de Genes , MicroRNAs/genética , Algoritmos , Simulação por Computador , Ácidos Graxos/biossíntese , Reprodutibilidade dos Testes , Glycine max/genética , Glycine max/metabolismoRESUMO
MOTIVATION: The rapid accumulation of microRNAs (miRNAs) and experimental evidence for miRNA interactions has ushered in a new area of miRNA research that focuses on network more than individual miRNA interaction, which provides a systematic view of the whole microRNome. So it is a challenge to infer miRNA functional interactions on a system-wide level and further draw a miRNA functional network (miRFN). A few studies have focused on the well-studied human species; however, these methods can neither be extended to other non-model organisms nor take fully into account the information embedded in miRNA-target and target-target interactions. Thus, it is important to develop appropriate methods for inferring the miRNA network of non-model species, such as soybean (Glycine max), without such extensive miRNA-phenotype associated data as miRNA-disease associations in human. RESULTS: Here we propose a new method to measure the functional similarity of miRNAs considering both the site accessibility and the interactive context of target genes in functional gene networks. We further construct the miRFNs of soybean, which is the first study on soybean miRNAs on the network level and the core methods can be easily extended to other species. We found that miRFNs of soybean exhibit a scale-free, small world and modular architecture, with their degrees fit best to power-law and exponential distribution. We also showed that miRNA with high degree tends to interact with those of low degree, which reveals the disassortativity and modularity of miRFNs. Our efforts in this study will be useful to further reveal the soybean miRNA-miRNA and miRNA-gene interactive mechanism on a systematic level. AVAILABILITY AND IMPLEMENTATION: A web tool for information retrieval and analysis of soybean miRFNs and the relevant target functional gene networks can be accessed at SoymiRNet: http://nclab.hit.edu.cn/SoymiRNet.
Assuntos
Redes Reguladoras de Genes , Glycine max/genética , MicroRNAs/genética , RNA de Plantas/genética , Análise por Conglomerados , MicroRNAs/química , Conformação de Ácido Nucleico , RNA de Plantas/química , TranscriptomaRESUMO
Existing methods for computing the semantic similarity between Gene Ontology (GO) terms are often based on external datasets and, therefore are not intrinsic to GO. Furthermore, they not only fail to handle identical annotations but also show a strong bias toward well-annotated proteins when being used for measuring similarity of proteins. Inspired by the concept of cellular differentiation and dedifferentiation in developmental biology, we propose a shortest semantic differentiation distance (SSDD) based on the concept of semantic totipotency to measure the semantic similarity of GO terms and further compare the functional similarity of proteins. Using human ratings and a benchmark dataset, SSDD was found to improve upon existing methods for computing the semantic similarity of GO terms. An in-depth analysis shows that SSDD is able to distinguish identical annotations and does not depend on annotation richness, thus producing more unbiased and reliable results. Online services can be accessed at the Gene Functional Similarity Analysis Tools website (GFSAT: http://nclab.hit.edu.cn/GFSAT).
Assuntos
Genes , Anotação de Sequência Molecular , Software , Vocabulário Controlado , Genômica/métodos , Semântica , Análise de Sequência de DNARESUMO
Cellular senescence (CS) is characterized by the irreversible cell cycle arrest and plays a key role in aging and diseases, such as cancer. Recent years have witnessed the burgeoning exploration of the intricate relationship between CS and cancer, with CS recognized as either a suppressing or promoting factor and officially acknowledged as one of the 14 cancer hallmarks. However, a comprehensive characterization remains absent from elucidating the divergences of this relationship across different cancer types and its involvement in the multi-facets of tumor development. Here we systematically assessed the cellular senescence of over 10,000 tumor samples from 33 cancer types, starting by defining a set of cancer-associated CS signatures and deriving a quantitative metric representing the CS status, called CS score. We then investigated the CS heterogeneity and its intricate relationship with the prognosis, immune infiltration, and therapeutic responses across different cancers. As a result, cellular senescence demonstrated two distinct prognostic groups: the protective group with eleven cancers, such as LIHC, and the risky group with four cancers, including STAD. Subsequent in-depth investigations between these two groups unveiled the potential molecular and cellular mechanisms underlying the distinct effects of cellular senescence, involving the divergent activation of specific pathways and variances in immune cell infiltrations. These results were further supported by the disparate associations of CS status with the responses to immuno- and chemo-therapies observed between the two groups. Overall, our study offers a deeper understanding of inter-tumor heterogeneity of cellular senescence associated with the tumor microenvironment and cancer prognosis.
RESUMO
BACKGROUND: Different individuals with renal cell carcinoma (RCC) exhibit substantial heterogeneity in histomorphology, genetic alterations in the proteome, immune cell infiltration patterns, and clinical behavior. OBJECTIVES: This study aims to use single-nucleus sequencing on ten samples (four normal, three clear cell renal cell carcinoma (ccRCC), and three chromophobe renal cell carcinoma (chRCC)) to uncover pathogenic origins and prognostic characteristics in patients with RCC. METHODS: By using two algorithms, inferCNV and k-means, the study explores malignant cells and compares them with the normal group to reveal their origins. Furthermore, we explore the pathogenic factors at the gene level through Summary-data-based Mendelian Randomization and co-localization methods. Based on the relevant malignant markers, a total of 212 machine-learning combinations were compared to develop a prognostic signature with high precision and stability. Finally, the study correlates with clinical data to investigate which cell subtypes may impact patients' prognosis. RESULTS & CONCLUSION: Two main origin tumor cells were identified: Proximal tubule cell B and Intercalated cell type A, which were highly differentiated in epithelial cells, and three gene loci were determined as potential pathogenic genes. The best malignant signature among the 212 prognostic models demonstrated high predictive power in ccRCC: (AUC: 0.920 (1-year), 0.920 (3-year) and 0.930 (5-year) in the training dataset; 0.756 (1-year), 0.828 (3-year), and 0.832 (5-year) in the testing dataset. In addition, we confirmed that LYVE1+ tissue-resident macrophage and TOX+ CD8 significantly impact the prognosis of ccRCC patients, while monocytes play a crucial role in the prognosis of chRCC patients.
Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Microambiente Tumoral , Humanos , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/patologia , Neoplasias Renais/genética , Neoplasias Renais/patologia , Microambiente Tumoral/genética , Prognóstico , Biomarcadores Tumorais/genética , Análise de Célula Única/métodos , Masculino , Aprendizado de Máquina , Regulação Neoplásica da Expressão Gênica , FemininoRESUMO
Methamphetamine (METH) is a highly addictive psychostimulant that causes physical and psychological damage and immune system disorder, especially in the liver that contains a significant number of immune cells. Dopamine, a key neurotransmitter in METH addiction and immune regulation, plays a crucial role in this process. Here, we developed a chronic METH administration model and conducted single-cell RNA sequencing (scRNA-seq) to investigate the effect of METH on liver immune cells and involvement of dopamine receptor D1 (DRD1). Our findings reveal that chronic exposure to METH induces immune cell identity shifts from Ifitm3+Macrophage (Mac) and Ccl5+Mac to Cd14+Mac, and from Fyn+CD4+T effector (Teff), CD8+T, and natural killer T cells (NKT) to Fos+CD4+T and Rora+ group 2 innate lymphoid cells (ILC2), along with suppression of multiple functional immune pathways. DRD1 is implicated in regulating certain pathways and identity shifts among the hepatic immune cells. Our results provide valuable insights into development of targeted therapies to mitigate METH-induced immune impairment.
RESUMO
Male infertility has emerged as a global issue, partly attributed to psychological stress. However, the cellular and molecular mechanisms underlying the adverse effects of psychological stress on male reproductive function remain elusive. We created a psychologically stressed model using terrified-sound and profiled the testes from stressed and control rats using single-cell RNA sequencing. Comparative and comprehensive transcriptome analyses of 11,744 testicular cells depicted the cellular landscape of spermatogenesis and revealed significant molecular alterations of spermatogenesis suffering from psychological stress. At the cellular level, stressed rats exhibited delayed spermatogenesis at the spermatogonia and pachytene phases, resulting in reduced sperm production. Additionally, psychological stress rewired cellular interactions among germ cells, negatively impacting reproductive development. Molecularly, we observed the down-regulation of anti-oxidation-related genes and up-regulation of genes promoting reactive oxygen species (ROS) generation in the stress group. These alterations led to elevated ROS levels in testes, affecting the expression of key regulators such as ATF2 and STAR, which caused reproductive damage through apoptosis or inhibition of testosterone synthesis. Overall, our study aimed to uncover the cellular and molecular mechanisms by which psychological stress disrupts spermatogenesis, offering insights into the mechanisms of psychological stress-induced male infertility in other species and promises in potential therapeutic targets.
RESUMO
As the most common type of renal cell carcinoma (RCC), the renal clear cell carcinoma (ccRCC) is highly malignant and insensitive to chemotherapy or radiotherapy. Although systemic immunotherapies have been successfully applied to ccRCC in recent years, screening for patients who can benefit most from these therapies is still essential and challenging due to immunological heterogeneity of ccRCC patients. To this end, we implemented a series of deep investigation on the expression and clinic data of ccRCC from The Cancer Genome Atlas (TCGA) International Consortium for Cancer Genomics (ICGC). We identified a total of 946 immune-related genes that were differentially expressed. Among them, five independent genes, including SHC1, WNT5A, NRP1, TGFA, and IL4R, were significantly associated with survival and used to construct the immune-related prognostic differential gene signature (IRPDGs). Then the ccRCC patients were categorized into high-risk and low-risk subgroups based on the median risk score of the IRPDGs. IRPDGs subgroups displays distinct genomic and immunological characteristics. Known immunotherapy-related genes show different mutation burden, wherein the mutation rate of VHL was higher than 40% in the two IRPDGs subgroups, and SETD2 and BAP1 mutations differed most between two groups with higher frequency in the high-risk subgroup. Moreover, IRPDGs subgroups had different abundance in tumor-infiltrating immune cells (TIICs) with distinct immunotherapy efficacy. Plasma cells, regulatory cells (Tregs), follicular helper T cells (Tfh), and M0 macrophages were enriched in the high-risk group with a higher tumor immune dysfunction and rejection (TIDE) score. In contrast, the low-risk group had abundant M1 macrophages, mast cell resting and dendritic cell resting infiltrates with lower TIDE score and benefited more from immune checkpoint inhibitors (ICI) treatment. Compared with other biomarkers, such as TIDE and tumor inflammatory signatures (TIS), IRPDGs demonstrated to be a better biomarker for assessing the prognosis of ccRCC and the efficacy of ICI treatment with the promise in screening precise patients for specific immunotherapies.
Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Biomarcadores , Carcinoma de Células Renais/diagnóstico , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/terapia , Humanos , Imunoterapia , Neoplasias Renais/diagnóstico , Neoplasias Renais/genética , Neoplasias Renais/terapia , PrognósticoRESUMO
Necroptosis is a programmed form of necrotic cell death in regulating cancer ontogenesis, progression, and tumor microenvironment (TME) and could drive tumor-infiltrating cells to release pro-inflammatory cytokines, incurring strong immune responses. Nowadays, there are few identified biomarkers applied in clinical immunotherapy, and it is increasingly recognized that high levels of tumor necroptosis could enhance the response to immunotherapy. However, comprehensive characterization of necroptosis associated with TME and immunotherapy in Hepatocellular carcinoma (HCC) remains unexplored. Here, we computationally characterized necroptosis landscape in HCC samples from TCGA and ICGA cohorts and stratified them into two necroptosis clusters (A or B) with significantly different characteristics in clinical prognosis, immune cell function, and TME-landscapes. Additionally, to further evaluate the necroptosis levels of each sample, we established a novel necroptosis-related gene score (NRGscore). We further investigated the TME, tumor mutational burden (TMB), clinical response to immunotherapy, and chemotherapeutic drug sensitivity of HCC subgroups stratified by the necroptosis landscapes. The NRGscore is robust and highly predictive of HCC clinical outcomes. Further analysis indicated that the high NRGscore group resembles the immune-inflamed phenotype while the low score group is analogous to the immune-exclusion or metabolism phenotype. Additionally, the high NRGscore group is more sensitive to immune checkpoint blockade-based immunotherapy, which was further validated using an external HCC cohort, metastatic melanoma cohort, and advanced urothelial cancer cohort. Besides, the NRGscore was demonstrated as a potential biomarker for chemotherapy, wherein the high NRGscore patients with more tumor stem cell composition could be more sensitive to Cisplatin, Doxorubicin, Paclitaxel-based chemotherapy, and Sorafenib therapy. Collectively, a comprehensive characterization of the necroptosis in HCC suggested its implications for predicting immune infiltration and response to immunotherapy of HCC, providing promising strategies for treatment.
RESUMO
Long noncoding RNAs (lncRNAs) have been shown to play an important role in tumor biogenesis and prognosis. The glioma is a grade classified cancer, however, we still lack the knowledge on their function during glioma progression. While previous studies have shown how lncRNAs regulate protein-coding gene epigenetically, it is still unclear how lncRNAs are regulated epigenetically. In this study, we firstly analyzed the RNA-seq data systematically across grades II, IV, and IV of glioma samples. We identified 60 lncRNAs that are significantly differentially expressed over disease progression (DElncRNA), including well-known PVT1, HOTAIR, H19 and rarely studied CARD8-AS, MIR4435-2HG. Secondly, by integrating HM450K methylation microarray data, we demonstrated that some of the lncRNAs are epigenetically regulated by methylation. Thirdly, we developed a DESeq2-GSEA-ceRNA-survival analysis strategy to investigate their functions. Particularly, MIR4435-2HG is highly expressed in high-grade glioma and may have an impact on EMT and TNFα signaling pathway by functioning as a miRNA sponge of miR-125a-5p and miR-125b-5p to increase the expression of CD44. Our results revealed the dynamic expression of lncRNAs in glioma progression and their epigenetic regulation mechanism.
Assuntos
Metilação de DNA , DNA de Neoplasias , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Glioma , MicroRNAs , RNA Longo não Codificante , RNA Neoplásico , DNA de Neoplasias/genética , DNA de Neoplasias/metabolismo , Perfilação da Expressão Gênica , Glioma/genética , Glioma/metabolismo , Glioma/patologia , Humanos , MicroRNAs/biossíntese , MicroRNAs/genética , Proteínas de Neoplasias/biossíntese , Proteínas de Neoplasias/genética , RNA Longo não Codificante/biossíntese , RNA Longo não Codificante/genética , RNA Neoplásico/biossíntese , RNA Neoplásico/genéticaRESUMO
Single-cell sequencing interrogates the sequence or chromatin information from individual cells with advanced next-generation sequencing technologies. It provides a higher resolution of cellular differences and a better understanding of the underlying genetic and epigenetic mechanisms of an individual cell in the context of its survival and adaptation to microenvironment. However, it is more challenging to perform single-cell sequencing and downstream data analysis, owing to the minimal amount of starting materials, sample loss, and contamination. In addition, due to the picogram level of the amount of nucleic acids used, heavy amplification is often needed during sample preparation of single-cell sequencing, resulting in the uneven coverage, noise, and inaccurate quantification of sequencing data. All these unique properties raise challenges in and thus high demands for computational methods that specifically fit single-cell sequencing data. We here comprehensively survey the current strategies and challenges for multiple single-cell sequencing, including single-cell transcriptome, genome, and epigenome, beginning with a brief introduction to multiple sequencing techniques for single cells.
Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Separação Celular/instrumentação , Separação Celular/métodos , Epigênese Genética/genética , Citometria de Fluxo/instrumentação , Citometria de Fluxo/métodos , Genômica/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Microdissecção e Captura a Laser/instrumentação , Microdissecção e Captura a Laser/métodos , Técnicas Analíticas Microfluídicas/instrumentação , Técnicas Analíticas Microfluídicas/métodos , Polimorfismo de Nucleotídeo Único/genética , RNA/genética , Análise de Sequência de DNA/instrumentação , Análise de Sequência de RNA/instrumentação , Análise de Célula Única/instrumentação , Transcriptoma/genéticaRESUMO
BACKGROUND: Understanding the embryonic stem cell (ESC) fate decision between self-renewal and proper differentiation is important for developmental biology and regenerative medicine. Attention has focused on mechanisms involving histone modifications, alternative pre-messenger RNA splicing, and cell-cycle progression. However, their intricate interrelations and joint contributions to ESC fate decision remain unclear. RESULTS: We analyze the transcriptomes and epigenomes of human ESC and five types of differentiated cells. We identify thousands of alternatively spliced exons and reveal their development and lineage-dependent characterizations. Several histone modifications show dynamic changes in alternatively spliced exons and three are strongly associated with 52.8% of alternative splicing events upon hESC differentiation. The histone modification-associated alternatively spliced genes predominantly function in G2/M phases and ATM/ATR-mediated DNA damage response pathway for cell differentiation, whereas other alternatively spliced genes are enriched in the G1 phase and pathways for self-renewal. These results imply a potential epigenetic mechanism by which some histone modifications contribute to ESC fate decision through the regulation of alternative splicing in specific pathways and cell-cycle genes. Supported by experimental validations and extended datasets from Roadmap/ENCODE projects, we exemplify this mechanism by a cell-cycle-related transcription factor, PBX1, which regulates the pluripotency regulatory network by binding to NANOG. We suggest that the isoform switch from PBX1a to PBX1b links H3K36me3 to hESC fate determination through the PSIP1/SRSF1 adaptor, which results in the exon skipping of PBX1. CONCLUSION: We reveal the mechanism by which alternative splicing links histone modifications to stem cell fate decision.