Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 170
Filtrar
1.
Nat Protoc ; 2024 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-39390263

RESUMEN

The term 'RNA-seq' refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, single cells or single nuclei. The kallisto, bustools and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data. Execution of this protocol requires basic familiarity with a command line environment. With this protocol, quantification of a moderately sized RNA-seq dataset can be completed within minutes.

2.
Nat Comput Sci ; 4(9): 677-689, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39317762

RESUMEN

Multimodal, single-cell genomics technologies enable simultaneous measurement of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell populations, such as regulation of cell fate by transcriptional stochasticity or tumor proliferation through aberrant splicing dynamics. However, current methods for determining cell types or 'clusters' in multimodal data often rely on ad hoc approaches to balance or integrate measurements, and assumptions ignoring inherent properties of the data. To enable interpretable and consistent cell cluster determination, we present meK-means (mechanistic K-means) which integrates modalities through a unifying model of transcription to learn underlying, shared biophysical states. With meK-means we can cluster cells with nascent and mature mRNA measurements, utilizing the causal, physical relationships between these modalities. This identifies shared transcription dynamics across cells, which induce the observed molecule counts, and provides an alternative definition for 'clusters' through the governing parameters of cellular processes.


Asunto(s)
Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Transcriptoma/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Genómica/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados , Análisis de Secuencia de ARN/métodos , Algoritmos , Transcripción Genética
3.
Nat Methods ; 21(8): 1466-1469, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39054391

RESUMEN

Here we present biVI, which combines the variational autoencoder framework of scVI with biophysical models describing the transcription and splicing kinetics of RNA molecules. We demonstrate on simulated and experimental single-cell RNA sequencing data that biVI retains the variational autoencoder's ability to capture cell type structure in a low-dimensional space while further enabling genome-wide exploration of the biophysical mechanisms, such as system burst sizes and degradation rates, that underlie observations.


Asunto(s)
Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Humanos , Empalme del ARN , Algoritmos , ARN/genética , ARN/química
4.
bioRxiv ; 2024 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-39071320

RESUMEN

Spatial homogeneous regions (SHRs) in tissues are domains that are homogeneous with respect to cell type composition. We present a method for identifying SHRs using spatial transcriptomics data, and demonstrate that it is efficient and effective at finding SHRs for a wide variety of tissue types. The method is implemented in a tool called concordex, which relies on analysis of k-nearest-neighbor (kNN) graphs. The concordex tool is also useful for analysis of non-spatial transcriptomics data, and can elucidate the extent of concordance between partitions of cells derived from clustering algorithms, and transcriptomic similarity as represented in kNN graphs.

5.
bioRxiv ; 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-39005347

RESUMEN

Recent advances in high-throughput, multi-condition experiments allow for genome-wide investigation of how perturbations affect transcription and translation in the cell across multiple biological entities or modalities, from chromatin and mRNA information to protein production and spatial morphology. This presents an unprecedented opportunity to unravel how the processes of DNA and RNA regulation direct cell fate determination and disease response. Most methods designed for analyzing large-scale perturbation data focus on the observational outcomes, e.g., expression; however, many potential transcriptional mechanisms, such as transcriptional bursting or splicing dynamics, can underlie these complex and noisy observations. In this analysis, we demonstrate how a stochastic biophysical modeling approach to interpreting high-throughout perturbation data enables deeper investigation of the 'how' behind such molecular measurements. Our approach takes advantage of modalities already present in data produced with current technologies, such as nascent and mature mRNA measurements, to illuminate transcriptional dynamics induced by perturbation, predict kinetic behaviors in new perturbation settings, and uncover novel populations of cells with distinct kinetic responses to perturbation.

6.
bioRxiv ; 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39071335

RESUMEN

RNA abundance quantification has become routine and affordable thanks to high-throughput "short-read" technologies that provide accurate molecule counts at the gene level. Similarly accurate and affordable quantification of definitive full-length, transcript isoforms has remained a stubborn challenge, despite its obvious biological significance across a wide range of problems. "Long-read" sequencing platforms now produce data-types that can, in principle, drive routine definitive isoform quantification. However some particulars of contemporary long-read datatypes, together with isoform complexity and genetic variation, present bioinformatic challenges. We show here, using ONT data, that fast and accurate quantification of long-read data is possible and that it is improved by exome capture. To perform quantifications we developed lr-kallisto, which adapts the kallisto bulk and single-cell RNA-seq quantification methods for long-read technologies.

7.
Genome Biol Evol ; 16(6)2024 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-38922665

RESUMEN

Molecular studies of animal regeneration typically focus on conserved genes and signaling pathways that underlie morphogenesis. To date, a holistic analysis of gene expression across animals has not been attempted, as it presents a suite of problems related to differences in experimental design and gene homology. By combining orthology analyses with a novel statistical method for testing gene enrichment across large data sets, we are able to test whether tissue regeneration across animals shares transcriptional regulation. We applied this method to a meta-analysis of six publicly available RNA-Seq data sets from diverse examples of animal regeneration. We recovered 160 conserved orthologous gene clusters, which are enriched in structural genes as opposed to those regulating morphogenesis. A breakdown of gene presence/absence provides limited support for the conservation of pathways typically implicated in regeneration, such as Wnt signaling and cell pluripotency pathways. Such pathways are only conserved if we permit large amounts of paralog switching through evolution. Overall, our analysis does not support the hypothesis that a shared set of ancestral genes underlie regeneration mechanisms in animals. After applying the same method to heat shock studies and getting similar results, we raise broader questions about the ability of comparative RNA-Seq to reveal conserved gene pathways across deep evolutionary relationships.


Asunto(s)
RNA-Seq , Regeneración , Animales , Regeneración/genética , Evolución Molecular , Análisis de Secuencia de ARN
8.
Bioinformatics ; 40(6)2024 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-38876979

RESUMEN

MOTIVATION: Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. RESULTS: We present a tool called splitcode, that enables flexible and efficient parsing, interpreting, and editing of sequencing reads. This versatile tool facilitates simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays. AVAILABILITY AND IMPLEMENTATION: The splitcode program is available at http://github.com/pachterlab/splitcode.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Biblioteca de Genes
9.
Nat Med ; 30(6): 1636-1644, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38867077

RESUMEN

Despite recent therapeutic advances, metastatic castration-resistant prostate cancer (mCRPC) remains lethal. Chimeric antigen receptor (CAR) T cell therapies have demonstrated durable remissions in hematological malignancies. We report results from a phase 1, first-in-human study of prostate stem cell antigen (PSCA)-directed CAR T cells in men with mCRPC. The starting dose level (DL) was 100 million (M) CAR T cells without lymphodepletion (LD), followed by incorporation of LD. The primary end points were safety and dose-limiting toxicities (DLTs). No DLTs were observed at DL1, with a DLT of grade 3 cystitis encountered at DL2, resulting in addition of a new cohort using a reduced LD regimen + 100 M CAR T cells (DL3). No DLTs were observed in DL3. Cytokine release syndrome of grade 1 or 2 occurred in 5 of 14 treated patients. Prostate-specific antigen declines (>30%) occurred in 4 of 14 patients, as well as radiographic improvements. Dynamic changes indicating activation of peripheral blood endogenous and CAR T cell subsets, TCR repertoire diversity and changes in the tumor immune microenvironment were observed in a subset of patients. Limited persistence of CAR T cells was observed beyond 28 days post-infusion. These results support future clinical studies to optimize dosing and combination strategies to improve durable therapeutic outcomes. ClinicalTrials.gov identifier NCT03873805 .


Asunto(s)
Antígenos de Neoplasias , Proteínas Ligadas a GPI , Inmunoterapia Adoptiva , Proteínas de Neoplasias , Neoplasias de la Próstata Resistentes a la Castración , Humanos , Masculino , Neoplasias de la Próstata Resistentes a la Castración/terapia , Neoplasias de la Próstata Resistentes a la Castración/inmunología , Neoplasias de la Próstata Resistentes a la Castración/patología , Anciano , Persona de Mediana Edad , Antígenos de Neoplasias/inmunología , Inmunoterapia Adoptiva/efectos adversos , Inmunoterapia Adoptiva/métodos , Proteínas Ligadas a GPI/inmunología , Proteínas de Neoplasias/inmunología , Receptores Quiméricos de Antígenos/inmunología , Metástasis de la Neoplasia , Linfocitos T/inmunología , Linfocitos T/trasplante , Antígeno Prostático Específico/sangre
10.
Biophys J ; 123(17): 2892-2901, 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-38715358

RESUMEN

The advent of high-throughput transcriptomics provides an opportunity to advance mechanistic understanding of transcriptional processes and their connections to cellular function at an unprecedented, genome-wide scale. These transcriptional systems, which involve discrete stochastic events, are naturally modeled using chemical master equations (CMEs), which can be solved for probability distributions to fit biophysical rates that govern system dynamics. While CME models have been used as standards in fluorescence transcriptomics for decades to analyze single-species RNA distributions, there are often no closed-form solutions to CMEs that model multiple species, such as nascent and mature RNA transcript counts. This has prevented the application of standard likelihood-based statistical methods for analyzing high-throughput, multi-species transcriptomic datasets using biophysical models. Inspired by recent work in machine learning to learn solutions to complex dynamical systems, we leverage neural networks and statistical understanding of system distributions to produce accurate approximations to a steady-state bivariate distribution for a model of the RNA life cycle that includes nascent and mature molecules. The steady-state distribution to this simple model has no closed-form solution and requires intensive numerical solving techniques: our approach reduces likelihood evaluation time by several orders of magnitude. We demonstrate two approaches, whereby solutions are approximated by 1) learning the weights of kernel distributions with constrained parameters or 2) learning both weights and scaling factors for parameters of kernel distributions. We show that our strategies, denoted by kernel weight regression and parameter-scaled kernel weight regression, respectively, enable broad exploration of parameter space and can be used in existing likelihood frameworks to infer transcriptional burst sizes, RNA splicing rates, and mRNA degradation rates from experimental transcriptomic data.


Asunto(s)
Transcripción Genética , Redes Neurales de la Computación , Modelos Genéticos
11.
Bioinformatics ; 40(4)2024 03 29.
Artículo en Inglés | MEDLINE | ID: mdl-38579259

RESUMEN

MOTIVATION: Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. RESULTS: We present seqspec, a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. AVAILABILITY AND IMPLEMENTATION: The specification and associated seqspec command line tool is available at https://www.doi.org/10.5281/zenodo.10213865.


Asunto(s)
Genómica , Programas Informáticos
12.
bioRxiv ; 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38617255

RESUMEN

Standard single-cell RNA-sequencing analysis (scRNA-seq) workflows consist of converting raw read data into cell-gene count matrices through sequence alignment, followed by analyses including filtering, highly variable gene selection, dimensionality reduction, clustering, and differential expression analysis. Seurat and Scanpy are the most widely-used packages implementing such workflows, and are generally thought to implement individual steps similarly. We investigate in detail the algorithms and methods underlying Seurat and Scanpy and find that there are, in fact, considerable differences in the outputs of Seurat and Scanpy. The extent of differences between the programs is approximately equivalent to the variability that would be introduced in benchmarking scRNA-seq datasets by sequencing less than 5% of the reads or analyzing less than 20% of the cell population. Additionally, distinct versions of Seurat and Scanpy can produce very different results, especially during parts of differential expression analysis. Our analysis highlights the need for users of scRNA-seq to carefully assess the tools on which they rely, and the importance of developers of scientific software to prioritize transparency, consistency, and reproducibility for their tools.

13.
Bioinformatics ; 40(3)2024 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-38377393

RESUMEN

MOTIVATION: Eukaryotic linear motifs (ELMs), or Short Linear Motifs, are protein interaction modules that play an essential role in cellular processes and signaling networks and are often involved in diseases like cancer. The ELM database is a collection of manually curated motif knowledge from scientific papers. It has become a crucial resource for investigating motif biology and recognizing candidate ELMs in novel amino acid sequences. Users can search amino acid sequences or UniProt Accessions on the ELM resource web interface. However, as with many web services, there are limitations in the swift processing of large-scale queries through the ELM web interface or API calls, and, therefore, integration into protein function analysis pipelines is limited. RESULTS: To allow swift, large-scale motif analyses on protein sequences using ELMs curated in the ELM database, we have extended the gget suite of Python and command line tools with a new module, gget elm, which does not rely on the ELM server for efficiently finding candidate ELMs in user-submitted amino acid sequences and UniProt Accessions. gget elm increases accessibility to the information stored in the ELM database and allows scalable searches for motif-mediated interaction sites in the amino acid sequences. AVAILABILITY AND IMPLEMENTATION: The manual and source code are available at https://github.com/pachterlab/gget.


Asunto(s)
Proteínas , Programas Informáticos , Secuencias de Aminoácidos , Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos
14.
Bioinform Adv ; 4(1): vbad181, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38213823

RESUMEN

Summary: Barcode-based sequence census assays utilize custom or random oligonucloetide sequences to label various biological features, such as cell-surface proteins or CRISPR perturbations. These assays all rely on barcode quantification, a task that is complicated by barcode design and technical noise. We introduce a modular approach to quantifying barcodes that achieves speed and memory improvements over existing tools. We also introduce a set of quality control metrics, and accompanying tool, for validating barcode designs. Availability and implementation: https://github.com/pachterlab/kb_python, https://github.com/pachterlab/qcbc.

15.
bioRxiv ; 2024 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-38168363

RESUMEN

There are an estimated 300,000 mammalian viruses from which infectious diseases in humans may arise. They inhabit human tissues such as the lungs, blood, and brain and often remain undetected. Efficient and accurate detection of viral infection is vital to understanding its impact on human health and to make accurate predictions to limit adverse effects, such as future epidemics. The increasing use of high-throughput sequencing methods in research, agriculture, and healthcare provides an opportunity for the cost-effective surveillance of viral diversity and investigation of virus-disease correlation. However, existing methods for identifying viruses in sequencing data rely on and are limited to reference genomes or cannot retain single-cell resolution through cell barcode tracking. We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on highly conserved amino acid domains, which enables the detection of RNA viruses covering up to 1012 virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We applied our method to identify putative novel viruses in rhesus macaque PBMC data that display cell type specificity and whose presence correlates with altered host gene expression.

16.
bioRxiv ; 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38045414

RESUMEN

The term "RNA-seq" refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, from single cells, or from single nuclei. The kallisto, bustools, and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples, or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data.

18.
Bull Math Biol ; 85(11): 114, 2023 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-37828255

RESUMEN

The serial nature of reactions involved in the RNA life-cycle motivates the incorporation of delays in models of transcriptional dynamics. The models couple a transcriptional process to a fairly general set of delayed monomolecular reactions with no feedback. We provide numerical strategies for calculating the RNA copy number distributions induced by these models, and solve several systems with splicing, degradation, and catalysis. An analysis of single-cell and single-nucleus RNA sequencing data using these models reveals that the kinetics of nuclear export do not appear to require invocation of a non-Markovian waiting time.


Asunto(s)
Conceptos Matemáticos , Modelos Biológicos , Procesos Estocásticos , Simulación por Computador , ARN , Cadenas de Markov , Algoritmos
19.
Cell Syst ; 14(10): 822-843.e22, 2023 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-37751736

RESUMEN

Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.


Asunto(s)
Modelos Biológicos , Biología de Sistemas , Procesos Estocásticos , ARN , Genómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA