Pesquisa | Portal de Pesquisa da BVS

Biophysical modeling with variational autoencoders for bimodal, single-cell RNA sequencing data.

Carilli, Maria; Gorin, Gennady; Choi, Yongin; Chari, Tara; Pachter, Lior.

Nat Methods ; 21(8): 1466-1469, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-39054391

RESUMO

Here we present biVI, which combines the variational autoencoder framework of scVI with biophysical models describing the transcription and splicing kinetics of RNA molecules. We demonstrate on simulated and experimental single-cell RNA sequencing data that biVI retains the variational autoencoder's ability to capture cell type structure in a low-dimensional space while further enabling genome-wide exploration of the biophysical mechanisms, such as system burst sizes and degradation rates, that underlie observations.

Assuntos

Análise de Sequência de RNA , Análise de Célula Única , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Humanos , Splicing de RNA , Algoritmos , RNA/genética , RNA/química

Spectral neural approximations for models of transcriptional dynamics.

Gorin, Gennady; Carilli, Maria; Chari, Tara; Pachter, Lior.

Biophys J ; 123(17): 2892-2901, 2024 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-38715358

RESUMO

The advent of high-throughput transcriptomics provides an opportunity to advance mechanistic understanding of transcriptional processes and their connections to cellular function at an unprecedented, genome-wide scale. These transcriptional systems, which involve discrete stochastic events, are naturally modeled using chemical master equations (CMEs), which can be solved for probability distributions to fit biophysical rates that govern system dynamics. While CME models have been used as standards in fluorescence transcriptomics for decades to analyze single-species RNA distributions, there are often no closed-form solutions to CMEs that model multiple species, such as nascent and mature RNA transcript counts. This has prevented the application of standard likelihood-based statistical methods for analyzing high-throughput, multi-species transcriptomic datasets using biophysical models. Inspired by recent work in machine learning to learn solutions to complex dynamical systems, we leverage neural networks and statistical understanding of system distributions to produce accurate approximations to a steady-state bivariate distribution for a model of the RNA life cycle that includes nascent and mature molecules. The steady-state distribution to this simple model has no closed-form solution and requires intensive numerical solving techniques: our approach reduces likelihood evaluation time by several orders of magnitude. We demonstrate two approaches, whereby solutions are approximated by 1) learning the weights of kernel distributions with constrained parameters or 2) learning both weights and scaling factors for parameters of kernel distributions. We show that our strategies, denoted by kernel weight regression and parameter-scaled kernel weight regression, respectively, enable broad exploration of parameter space and can be used in existing likelihood frameworks to infer transcriptional burst sizes, RNA splicing rates, and mRNA degradation rates from experimental transcriptomic data.

Assuntos

Transcrição Gênica , Redes Neurais de Computação , Modelos Genéticos

CST does not evict elongating telomerase but prevents initiation by ssDNA binding.

Zaug, Arthur J; Lim, Ci Ji; Olson, Conner L; Carilli, Maria T; Goodrich, Karen J; Wuttke, Deborah S; Cech, Thomas R.

Nucleic Acids Res ; 49(20): 11653-11665, 2021 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-34718732

RESUMO

The CST complex (CTC1-STN1-TEN1) has been shown to inhibit telomerase extension of the G-strand of telomeres and facilitate the switch to C-strand synthesis by DNA polymerase alpha-primase (pol α-primase). Recently the structure of human CST was solved by cryo-EM, allowing the design of mutant proteins defective in telomeric ssDNA binding and prompting the reexamination of CST inhibition of telomerase. The previous proposal that human CST inhibits telomerase by sequestration of the DNA primer was tested with a series of DNA-binding mutants of CST and modeled by a competitive binding simulation. The DNA-binding mutants had substantially reduced ability to inhibit telomerase, as predicted from their reduced affinity for telomeric DNA. These results provide strong support for the previous primer sequestration model. We then tested whether addition of CST to an ongoing processive telomerase reaction would terminate DNA extension. Pulse-chase telomerase reactions with addition of either wild-type CST or DNA-binding mutants showed that CST has no detectable ability to terminate ongoing telomerase extension in vitro. The same lack of inhibition was observed with or without pol α-primase bound to CST. These results suggest how the switch from telomerase extension to C-strand synthesis may occur.

Assuntos

DNA de Cadeia Simples/metabolismo , Telomerase/metabolismo , Proteínas de Ligação a Telômeros/metabolismo , DNA Polimerase I/metabolismo , DNA Primase/metabolismo , DNA de Cadeia Simples/química , DNA de Cadeia Simples/genética , Células HEK293 , Humanos , Mutação , Ligação Proteica , Telomerase/química

Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression.

Luebbert, Laura; Sullivan, Delaney K; Carilli, Maria; Hjörleifsson, Kristján Eldjárn; Winnett, Alexander Viloria; Chari, Tara; Pachter, Lior.

bioRxiv ; 2024 May 04.

Artigo em Inglês | MEDLINE | ID: mdl-38168363

RESUMO

There are an estimated 300,000 mammalian viruses from which infectious diseases in humans may arise. They inhabit human tissues such as the lungs, blood, and brain and often remain undetected. Efficient and accurate detection of viral infection is vital to understanding its impact on human health and to make accurate predictions to limit adverse effects, such as future epidemics. The increasing use of high-throughput sequencing methods in research, agriculture, and healthcare provides an opportunity for the cost-effective surveillance of viral diversity and investigation of virus-disease correlation. However, existing methods for identifying viruses in sequencing data rely on and are limited to reference genomes or cannot retain single-cell resolution through cell barcode tracking. We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on highly conserved amino acid domains, which enables the detection of RNA viruses covering up to 1012 virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We applied our method to identify putative novel viruses in rhesus macaque PBMC data that display cell type specificity and whose presence correlates with altered host gene expression.

Biophysical modeling with variational autoencoders for bimodal, single-cell RNA sequencing data.

Carilli, Maria; Gorin, Gennady; Choi, Yongin; Chari, Tara; Pachter, Lior.

bioRxiv ; 2023 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-36712140

RESUMO

We motivate and present biVI, which combines the variational autoencoder framework of scVI with biophysically motivated, bivariate models for nascent and mature RNA distributions. While previous approaches to integrate bimodal data via the variational autoencoder framework ignore the causal relationship between measurements, biVI models the biophysical processes that give rise to observations. We demonstrate through simulated benchmarking that biVI captures cell type structure in a low-dimensional space and accurately recapitulates parameter values and copy number distributions. On biological data, biVI provides a scalable route for identifying the biophysical mechanisms underlying gene expression. This analytical approach outlines a generalizable strategy for treating multimodal datasets generated by high-throughput, single-cell genomic assays.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA