RESUMO
Single-cell analysis across multiple samples and conditions requires quantitative modeling of the interplay between the continuum of cell states and the technical and biological sources of sample-to-sample variability. We introduce GEDI, a generative model that identifies latent space variations in multi-sample, multi-condition single-cell datasets and attributes them to sample-level covariates. GEDI enables cross-sample cell state mapping on par with state-of-the-art integration methods, cluster-free differential gene expression analysis along the continuum of cell states, and machine learning-based prediction of sample characteristics from single-cell data. GEDI can also incorporate gene-level prior knowledge to infer pathway and regulatory network activities in single cells. Finally, GEDI extends all these concepts to previously unexplored modalities that require joint consideration of dual measurements, such as the joint analysis of exon inclusion/exclusion reads to model alternative cassette exon splicing, or spliced/unspliced reads to model the mRNA stability landscapes of single cells.
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Perfilação da Expressão Gênica/métodos , Aprendizado de Máquina , Redes Reguladoras de Genes , Biologia Computacional/métodos , Algoritmos , Processamento AlternativoRESUMO
Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing RNA sequencing methods have significant limitations: short-read (SR) sequencing provides high depth but struggles with isoform deconvolution, whereas long-read (LR) sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. By applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of transcript abundances, we show that untranslated regions (UTRs) are major determinants of isoform proportion and exon usage; this effect is mediated through isoform-specific sequence features embedded in UTRs, which likely interact with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.
RESUMO
Aberrant alternative splicing is a hallmark of cancer, yet the underlying regulatory programs that control this process remain largely unknown. Here, we report a systematic effort to decipher the RNA structural code that shapes pathological splicing during breast cancer metastasis. We discovered a previously unknown structural splicing enhancer that is enriched near cassette exons with increased inclusion in highly metastatic cells. We show that the spliceosomal protein small nuclear ribonucleoprotein polypeptide A' (SNRPA1) interacts with these enhancers to promote cassette exon inclusion. This interaction enhances metastatic lung colonization and cancer cell invasion, in part through SNRPA1-mediated regulation of PLEC alternative splicing, which can be counteracted by splicing modulating morpholinos. Our findings establish a noncanonical regulatory role for SNRPA1 as a prometastatic splicing enhancer in breast cancer.