Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Nature ; 629(8014): 1165-1173, 2024 May.
Article in English | MEDLINE | ID: mdl-38720076

ABSTRACT

The nucleus is highly organized, such that factors involved in the transcription and processing of distinct classes of RNA are confined within specific nuclear bodies1,2. One example is the nuclear speckle, which is defined by high concentrations of protein and noncoding RNA regulators of pre-mRNA splicing3. What functional role, if any, speckles might play in the process of mRNA splicing is unclear4,5. Here we show that genes localized near nuclear speckles display higher spliceosome concentrations, increased spliceosome binding to their pre-mRNAs and higher co-transcriptional splicing levels than genes that are located farther from nuclear speckles. Gene organization around nuclear speckles is dynamic between cell types, and changes in speckle proximity lead to differences in splicing efficiency. Finally, directed recruitment of a pre-mRNA to nuclear speckles is sufficient to increase mRNA splicing levels. Together, our results integrate the long-standing observations of nuclear speckles with the biochemistry of mRNA splicing and demonstrate a crucial role for dynamic three-dimensional spatial organization of genomic DNA in driving spliceosome concentrations and controlling the efficiency of mRNA splicing.


Subject(s)
Genome , Nuclear Speckles , RNA Precursors , RNA Splicing , RNA, Messenger , Spliceosomes , Animals , Humans , Male , Mice , Genes , Genome/genetics , Human Embryonic Stem Cells/metabolism , Mouse Embryonic Stem Cells/metabolism , Nuclear Speckles/genetics , Nuclear Speckles/metabolism , RNA Precursors/metabolism , RNA Precursors/genetics , RNA Splicing/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Spliceosomes/metabolism , Transcription, Genetic
2.
Nat Methods ; 21(8): 1466-1469, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39054391

ABSTRACT

Here we present biVI, which combines the variational autoencoder framework of scVI with biophysical models describing the transcription and splicing kinetics of RNA molecules. We demonstrate on simulated and experimental single-cell RNA sequencing data that biVI retains the variational autoencoder's ability to capture cell type structure in a low-dimensional space while further enabling genome-wide exploration of the biophysical mechanisms, such as system burst sizes and degradation rates, that underlie observations.


Subject(s)
Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Sequence Analysis, RNA/methods , Humans , RNA Splicing , Algorithms , RNA/genetics , RNA/chemistry
3.
Biophys J ; 123(17): 2892-2901, 2024 Sep 03.
Article in English | MEDLINE | ID: mdl-38715358

ABSTRACT

The advent of high-throughput transcriptomics provides an opportunity to advance mechanistic understanding of transcriptional processes and their connections to cellular function at an unprecedented, genome-wide scale. These transcriptional systems, which involve discrete stochastic events, are naturally modeled using chemical master equations (CMEs), which can be solved for probability distributions to fit biophysical rates that govern system dynamics. While CME models have been used as standards in fluorescence transcriptomics for decades to analyze single-species RNA distributions, there are often no closed-form solutions to CMEs that model multiple species, such as nascent and mature RNA transcript counts. This has prevented the application of standard likelihood-based statistical methods for analyzing high-throughput, multi-species transcriptomic datasets using biophysical models. Inspired by recent work in machine learning to learn solutions to complex dynamical systems, we leverage neural networks and statistical understanding of system distributions to produce accurate approximations to a steady-state bivariate distribution for a model of the RNA life cycle that includes nascent and mature molecules. The steady-state distribution to this simple model has no closed-form solution and requires intensive numerical solving techniques: our approach reduces likelihood evaluation time by several orders of magnitude. We demonstrate two approaches, whereby solutions are approximated by 1) learning the weights of kernel distributions with constrained parameters or 2) learning both weights and scaling factors for parameters of kernel distributions. We show that our strategies, denoted by kernel weight regression and parameter-scaled kernel weight regression, respectively, enable broad exploration of parameter space and can be used in existing likelihood frameworks to infer transcriptional burst sizes, RNA splicing rates, and mRNA degradation rates from experimental transcriptomic data.


Subject(s)
Transcription, Genetic , Neural Networks, Computer , Models, Genetic
4.
PLoS Comput Biol ; 19(8): e1011288, 2023 08.
Article in English | MEDLINE | ID: mdl-37590228

ABSTRACT

Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.


Subject(s)
Data Analysis , Genomics , Humans
5.
PLoS Comput Biol ; 18(9): e1010492, 2022 09.
Article in English | MEDLINE | ID: mdl-36094956

ABSTRACT

We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems.


Subject(s)
RNA , RNA/genetics , Workflow
6.
Nat Comput Sci ; 4(9): 677-689, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39317762

ABSTRACT

Multimodal, single-cell genomics technologies enable simultaneous measurement of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell populations, such as regulation of cell fate by transcriptional stochasticity or tumor proliferation through aberrant splicing dynamics. However, current methods for determining cell types or 'clusters' in multimodal data often rely on ad hoc approaches to balance or integrate measurements, and assumptions ignoring inherent properties of the data. To enable interpretable and consistent cell cluster determination, we present meK-means (mechanistic K-means) which integrates modalities through a unifying model of transcription to learn underlying, shared biophysical states. With meK-means we can cluster cells with nascent and mature mRNA measurements, utilizing the causal, physical relationships between these modalities. This identifies shared transcription dynamics across cells, which induce the observed molecule counts, and provides an alternative definition for 'clusters' through the governing parameters of cellular processes.


Subject(s)
Single-Cell Analysis , Humans , Single-Cell Analysis/methods , Transcriptome/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Genomics/methods , Gene Expression Profiling/methods , Cluster Analysis , Sequence Analysis, RNA/methods , Algorithms , Transcription, Genetic
7.
bioRxiv ; 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-39005347

ABSTRACT

Recent advances in high-throughput, multi-condition experiments allow for genome-wide investigation of how perturbations affect transcription and translation in the cell across multiple biological entities or modalities, from chromatin and mRNA information to protein production and spatial morphology. This presents an unprecedented opportunity to unravel how the processes of DNA and RNA regulation direct cell fate determination and disease response. Most methods designed for analyzing large-scale perturbation data focus on the observational outcomes, e.g., expression; however, many potential transcriptional mechanisms, such as transcriptional bursting or splicing dynamics, can underlie these complex and noisy observations. In this analysis, we demonstrate how a stochastic biophysical modeling approach to interpreting high-throughout perturbation data enables deeper investigation of the 'how' behind such molecular measurements. Our approach takes advantage of modalities already present in data produced with current technologies, such as nascent and mature mRNA measurements, to illuminate transcriptional dynamics induced by perturbation, predict kinetic behaviors in new perturbation settings, and uncover novel populations of cells with distinct kinetic responses to perturbation.

8.
bioRxiv ; 2024 May 04.
Article in English | MEDLINE | ID: mdl-38168363

ABSTRACT

There are an estimated 300,000 mammalian viruses from which infectious diseases in humans may arise. They inhabit human tissues such as the lungs, blood, and brain and often remain undetected. Efficient and accurate detection of viral infection is vital to understanding its impact on human health and to make accurate predictions to limit adverse effects, such as future epidemics. The increasing use of high-throughput sequencing methods in research, agriculture, and healthcare provides an opportunity for the cost-effective surveillance of viral diversity and investigation of virus-disease correlation. However, existing methods for identifying viruses in sequencing data rely on and are limited to reference genomes or cannot retain single-cell resolution through cell barcode tracking. We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on highly conserved amino acid domains, which enables the detection of RNA viruses covering up to 1012 virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We applied our method to identify putative novel viruses in rhesus macaque PBMC data that display cell type specificity and whose presence correlates with altered host gene expression.

9.
bioRxiv ; 2024 Jul 18.
Article in English | MEDLINE | ID: mdl-39071320

ABSTRACT

Spatial homogeneous regions (SHRs) in tissues are domains that are homogeneous with respect to cell type composition. We present a method for identifying SHRs using spatial transcriptomics data, and demonstrate that it is efficient and effective at finding SHRs for a wide variety of tissue types. The method is implemented in a tool called concordex, which relies on analysis of k-nearest-neighbor (kNN) graphs. The concordex tool is also useful for analysis of non-spatial transcriptomics data, and can elucidate the extent of concordance between partitions of cells derived from clustering algorithms, and transcriptomic similarity as represented in kNN graphs.

10.
bioRxiv ; 2023 Sep 19.
Article in English | MEDLINE | ID: mdl-37745403

ABSTRACT

Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to the role of stochasticity in driving heterogeneity. However, current methods for determining cell types or 'clusters' present in multimodal data often rely on ad hoc or independent treatment of modalities, and assumptions ignoring inherent properties of the count data. To enable interpretable and consistent cell cluster determination from multimodal data, we present meK-Means (mechanistic K-Means) which integrates modalities and learns underlying, shared biophysical states through a unifying model of transcription. In particular, we demonstrate how meK-Means can be used to cluster cells from unspliced and spliced mRNA count modalities. By utilizing the causal, physical relationships underlying these modalities, we identify shared transcriptional kinetics across cells, which induce the observed gene expression profiles, and provide an alternative definition for 'clusters' through the governing parameters of cellular processes.

11.
bioRxiv ; 2023 May 02.
Article in English | MEDLINE | ID: mdl-36712140

ABSTRACT

We motivate and present biVI, which combines the variational autoencoder framework of scVI with biophysically motivated, bivariate models for nascent and mature RNA distributions. While previous approaches to integrate bimodal data via the variational autoencoder framework ignore the causal relationship between measurements, biVI models the biophysical processes that give rise to observations. We demonstrate through simulated benchmarking that biVI captures cell type structure in a low-dimensional space and accurately recapitulates parameter values and copy number distributions. On biological data, biVI provides a scalable route for identifying the biophysical mechanisms underlying gene expression. This analytical approach outlines a generalizable strategy for treating multimodal datasets generated by high-throughput, single-cell genomic assays.

12.
Sci Adv ; 7(48): eabh1683, 2021 Nov 26.
Article in English | MEDLINE | ID: mdl-34826233

ABSTRACT

We present an organism-wide, transcriptomic cell atlas of the hydrozoan medusa Clytia hemisphaerica and describe how its component cell types respond to perturbation. Using multiplexed single-cell RNA sequencing, in which individual animals were indexed and pooled from control and perturbation conditions into a single sequencing run, we avoid artifacts from batch effects and are able to discern shifts in cell state in response to organismal perturbations. This work serves as a foundation for future studies of development, function, and regeneration in a genetically tractable jellyfish species. Moreover, we introduce a powerful workflow for high-resolution, whole-animal, multiplexed single-cell genomics that is readily adaptable to other traditional or nontraditional model organisms.

SELECTION OF CITATIONS
SEARCH DETAIL