Búsqueda | Portal de Búsqueda de la BVS

1.

Multimodal Analysis of Cell Types in a Hypothalamic Node Controlling Social Behavior.

Kim, Dong-Wook; Yao, Zizhen; Graybuck, Lucas T; Kim, Tae Kyung; Nguyen, Thuc Nghi; Smith, Kimberly A; Fong, Olivia; Yi, Lynn; Koulena, Noushin; Pierson, Nico; Shah, Sheel; Lo, Liching; Pool, Allan-Hermann; Oka, Yuki; Pachter, Lior; Cai, Long; Tasic, Bosiljka; Zeng, Hongkui; Anderson, David J.

Cell ; 179(3): 713-728.e17, 2019 10 17.

Artículo en Inglés | MEDLINE | ID: mdl-31626771

RESUMEN

The ventrolateral subdivision of the ventromedial hypothalamus (VMHvl) contains â¼4,000 neurons that project to multiple targets and control innate social behaviors including aggression and mounting. However, the number of cell types in VMHvl and their relationship to connectivity and behavioral function are unknown. We performed single-cell RNA sequencing using two independent platforms-SMART-seq (â¼4,500 neurons) and 10x (â¼78,000 neurons)-and investigated correspondence between transcriptomic identity and axonal projections or behavioral activation, respectively. Canonical correlation analysis (CCA) identified 17 transcriptomic types (T-types), including several sexually dimorphic clusters, the majority of which were validated by seqFISH. Immediate early gene analysis identified T-types exhibiting preferential responses to intruder males versus females but only rare examples of behavior-specific activation. Unexpectedly, many VMHvl T-types comprise a mixed population of neurons with different projection target preferences. Overall our analysis revealed that, surprisingly, few VMHvl T-types exhibit a clear correspondence with behavior-specific activation and connectivity.

Asunto(s)

Hipotálamo/citología , Neuronas/clasificación , Conducta Social , Animales , Receptor alfa de Estrógeno/genética , Receptor alfa de Estrógeno/metabolismo , Femenino , Hipotálamo/fisiología , Masculino , Ratones , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Neuronas/metabolismo , Neuronas/fisiología , Conducta Sexual Animal , Análisis de la Célula Individual , Transcriptoma

2.

Biophysical modeling with variational autoencoders for bimodal, single-cell RNA sequencing data.

Carilli, Maria; Gorin, Gennady; Choi, Yongin; Chari, Tara; Pachter, Lior.

Nat Methods ; 21(8): 1466-1469, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-39054391

RESUMEN

Here we present biVI, which combines the variational autoencoder framework of scVI with biophysical models describing the transcription and splicing kinetics of RNA molecules. We demonstrate on simulated and experimental single-cell RNA sequencing data that biVI retains the variational autoencoder's ability to capture cell type structure in a low-dimensional space while further enabling genome-wide exploration of the biophysical mechanisms, such as system burst sizes and degradation rates, that underlie observations.

Asunto(s)

Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Humanos , Empalme del ARN , Algoritmos , ARN/genética , ARN/química

3.

Isoform cell-type specificity in the mouse primary motor cortex.

Booeshaghi, A Sina; Yao, Zizhen; van Velthoven, Cindy; Smith, Kimberly; Tasic, Bosiljka; Zeng, Hongkui; Pachter, Lior.

Nature ; 598(7879): 195-199, 2021 10.

Artículo en Inglés | MEDLINE | ID: mdl-34616073

RESUMEN

Full-length SMART-seq1 single-cell RNA sequencing can be used to measure gene expression at isoform resolution, making possible the identification of specific isoform markers for different cell types. Used in conjunction with spatial RNA capture and gene-tagging methods, this enables the inference of spatially resolved isoform expression for different cell types. Here, in a comprehensive analysis of 6,160 mouse primary motor cortex cells assayed with SMART-seq, 280,327 cells assayed with MERFISH2 and 94,162 cells assayed with 10x Genomics sequencing3, we find examples of isoform specificity in cell types-including isoform shifts between cell types that are masked in gene-level analysis-as well as examples of transcriptional regulation. Additionally, we show that isoform specificity helps to refine cell types, and that a multi-platform analysis of single-cell transcriptomic data leveraging multiple measurements provides a comprehensive atlas of transcription in the mouse primary motor cortex that improves on the possibilities offered by any single technology.

Asunto(s)

Perfilación de la Expresión Génica , Hibridación Fluorescente in Situ , Corteza Motora/citología , Neuronas/clasificación , Análisis de la Célula Individual , Transcriptoma , Animales , Atlas como Asunto , Femenino , Neuronas GABAérgicas/citología , Neuronas GABAérgicas/metabolismo , Glutamatos/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Corteza Motora/anatomía & histología , Neuronas/citología , Neuronas/metabolismo , Especificidad de Órganos , Análisis de Secuencia

4.

RNA Velocity: Molecular Kinetics from Single-Cell RNA-Seq.

Svensson, Valentine; Pachter, Lior.

Mol Cell ; 72(1): 7-9, 2018 10 04.

Artículo en Inglés | MEDLINE | ID: mdl-30290149

RESUMEN

Applying a kinetic model of RNA transcription and splicing, La Manno et al. (2018) predict changes in mRNA levels of individual cells from single-cell RNA-seq data.

Asunto(s)

Empalme del ARN , ARN , Cinética , ARN Mensajero

5.

Museum of spatial transcriptomics.

Moses, Lambda; Pachter, Lior.

Nat Methods ; 19(5): 534-546, 2022 05.

Artículo en Inglés | MEDLINE | ID: mdl-35273392

RESUMEN

The function of many biological systems, such as embryos, liver lobules, intestinal villi, and tumors, depends on the spatial organization of their cells. In the past decade, high-throughput technologies have been developed to quantify gene expression in space, and computational methods have been developed that leverage spatial gene expression data to identify genes with spatial patterns and to delineate neighborhoods within tissues. To comprehensively document spatial gene expression technologies and data-analysis methods, we present a curated review of literature on spatial transcriptomics dating back to 1987, along with a thorough analysis of trends in the field, such as usage of experimental techniques, species, tissues studied, and computational approaches used. Our Review places current methods in a historical context, and we derive insights about the field that can guide current research strategies. A companion supplement offers a more detailed look at the technologies and methods analyzed: https://pachterlab.github.io/LP_2021/ .

Asunto(s)

Museos , Transcriptoma , Hígado

6.

Flexible parsing, interpretation, and editing of technical sequences with splitcode.

Sullivan, Delaney K; Pachter, Lior.

Bioinformatics ; 40(6)2024 Jun 03.

Artículo en Inglés | MEDLINE | ID: mdl-38876979

RESUMEN

MOTIVATION: Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. RESULTS: We present a tool called splitcode, that enables flexible and efficient parsing, interpreting, and editing of sequencing reads. This versatile tool facilitates simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays. AVAILABILITY AND IMPLEMENTATION: The splitcode program is available at http://github.com/pachterlab/splitcode.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Biblioteca de Genes

7.

A machine-readable specification for genomics assays.

Booeshaghi, Ali Sina; Chen, Xi; Pachter, Lior.

Bioinformatics ; 40(4)2024 Mar 29.

Artículo en Inglés | MEDLINE | ID: mdl-38579259

RESUMEN

MOTIVATION: Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. RESULTS: We present seqspec, a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. AVAILABILITY AND IMPLEMENTATION: The specification and associated seqspec command line tool is available at https://www.doi.org/10.5281/zenodo.10213865.

Asunto(s)

Genómica , Programas Informáticos

8.

Fast and scalable querying of eukaryotic linear motifs with gget elm.

Luebbert, Laura; Hoang, Chi; Kumar, Manjeet; Pachter, Lior.

Bioinformatics ; 40(3)2024 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-38377393

RESUMEN

MOTIVATION: Eukaryotic linear motifs (ELMs), or Short Linear Motifs, are protein interaction modules that play an essential role in cellular processes and signaling networks and are often involved in diseases like cancer. The ELM database is a collection of manually curated motif knowledge from scientific papers. It has become a crucial resource for investigating motif biology and recognizing candidate ELMs in novel amino acid sequences. Users can search amino acid sequences or UniProt Accessions on the ELM resource web interface. However, as with many web services, there are limitations in the swift processing of large-scale queries through the ELM web interface or API calls, and, therefore, integration into protein function analysis pipelines is limited. RESULTS: To allow swift, large-scale motif analyses on protein sequences using ELMs curated in the ELM database, we have extended the gget suite of Python and command line tools with a new module, gget elm, which does not rely on the ELM server for efficiently finding candidate ELMs in user-submitted amino acid sequences and UniProt Accessions. gget elm increases accessibility to the information stored in the ELM database and allows scalable searches for motif-mediated interaction sites in the amino acid sequences. AVAILABILITY AND IMPLEMENTATION: The manual and source code are available at https://github.com/pachterlab/gget.

Asunto(s)

Proteínas , Programas Informáticos , Secuencias de Aminoácidos , Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos

9.

Spectral neural approximations for models of transcriptional dynamics.

Gorin, Gennady; Carilli, Maria; Chari, Tara; Pachter, Lior.

Biophys J ; 2024 May 06.

Artículo en Inglés | MEDLINE | ID: mdl-38715358

RESUMEN

The advent of high-throughput transcriptomics provides an opportunity to advance mechanistic understanding of transcriptional processes and their connections to cellular function at an unprecedented, genome-wide scale. These transcriptional systems, which involve discrete stochastic events, are naturally modeled using chemical master equations (CMEs), which can be solved for probability distributions to fit biophysical rates that govern system dynamics. While CME models have been used as standards in fluorescence transcriptomics for decades to analyze single-species RNA distributions, there are often no closed-form solutions to CMEs that model multiple species, such as nascent and mature RNA transcript counts. This has prevented the application of standard likelihood-based statistical methods for analyzing high-throughput, multi-species transcriptomic datasets using biophysical models. Inspired by recent work in machine learning to learn solutions to complex dynamical systems, we leverage neural networks and statistical understanding of system distributions to produce accurate approximations to a steady-state bivariate distribution for a model of the RNA life cycle that includes nascent and mature molecules. The steady-state distribution to this simple model has no closed-form solution and requires intensive numerical solving techniques: our approach reduces likelihood evaluation time by several orders of magnitude. We demonstrate two approaches, whereby solutions are approximated by 1) learning the weights of kernel distributions with constrained parameters or 2) learning both weights and scaling factors for parameters of kernel distributions. We show that our strategies, denoted by kernel weight regression and parameter-scaled kernel weight regression, respectively, enable broad exploration of parameter space and can be used in existing likelihood frameworks to infer transcriptional burst sizes, RNA splicing rates, and mRNA degradation rates from experimental transcriptomic data.

10.

Efficient querying of genomic reference databases with gget.

Luebbert, Laura; Pachter, Lior.

Bioinformatics ; 39(1)2023 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-36610989

RESUMEN

MOTIVATION: A recurring challenge in interpreting genomic data is the assessment of results in the context of existing reference databases. With the increasing number of command line and Python users, there is a need for tools implementing automated, easy programmatic access to curated reference information stored in a diverse collection of large, public genomic databases. RESULTS: gget is a free and open-source command line tool and Python package that enables efficient querying of genomic reference databases, such as Ensembl. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying required for genomic data analysis in a single line of code. AVAILABILITY AND IMPLEMENTATION: The manual and source code are available at https://github.com/pachterlab/gget. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Genómica , Programas Informáticos , Genómica/métodos , Genoma , Bases de Datos Factuales , Análisis de Datos

11.

Metadata retrieval from sequence databases with ffq.

Gálvez-Merchán, Ángel; Min, Kyung Hoi Joseph; Pachter, Lior; Booeshaghi, A Sina.

Bioinformatics ; 39(1)2023 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-36610997

RESUMEN

MOTIVATION: Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction. RESULTS: We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access. AVAILABILITY AND IMPLEMENTATION: ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.

Asunto(s)

Metadatos , Programas Informáticos , Bases de Datos de Ácidos Nucleicos

12.

The specious art of single-cell genomics.

Chari, Tara; Pachter, Lior.

PLoS Comput Biol ; 19(8): e1011288, 2023 08.

Artículo en Inglés | MEDLINE | ID: mdl-37590228

RESUMEN

Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.

Asunto(s)

Análisis de Datos , Genómica , Humanos

13.

RNA velocity unraveled.

Gorin, Gennady; Fang, Meichen; Chari, Tara; Pachter, Lior.

PLoS Comput Biol ; 18(9): e1010492, 2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-36094956

RESUMEN

We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems.

Asunto(s)

ARN , ARN/genética , Flujo de Trabajo

14.

Assessing Markovian and Delay Models for Single-Nucleus RNA Sequencing.

Gorin, Gennady; Yoshida, Shawn; Pachter, Lior.

Bull Math Biol ; 85(11): 114, 2023 10 12.

Artículo en Inglés | MEDLINE | ID: mdl-37828255

RESUMEN

The serial nature of reactions involved in the RNA life-cycle motivates the incorporation of delays in models of transcriptional dynamics. The models couple a transcriptional process to a fairly general set of delayed monomolecular reactions with no feedback. We provide numerical strategies for calculating the RNA copy number distributions induced by these models, and solve several systems with splicing, degradation, and catalysis. An analysis of single-cell and single-nucleus RNA sequencing data using these models reveals that the kinetics of nuclear export do not appear to require invocation of a non-Markovian waiting time.

Asunto(s)

Conceptos Matemáticos , Modelos Biológicos , Procesos Estocásticos , Simulación por Computador , ARN , Cadenas de Markov , Algoritmos

15.

SWALO: scaffolding with assembly likelihood optimization.

Rahman, Atif; Pachter, Lior.

Nucleic Acids Res ; 49(20): e117, 2021 11 18.

Artículo en Inglés | MEDLINE | ID: mdl-34417615

RESUMEN

Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.

Asunto(s)

Mapeo Contig/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bacterias , Humanos , Funciones de Verosimilitud

16.

Modeling bursty transcription and splicing with the chemical master equation.

Gorin, Gennady; Pachter, Lior.

Biophys J ; 121(6): 1056-1069, 2022 03 15.

Artículo en Inglés | MEDLINE | ID: mdl-35143775

RESUMEN

Splicing cascades that alter gene products posttranscriptionally also affect expression dynamics. We study a class of processes and associated distributions that emerge from models of bursty promoters coupled to directed acyclic graphs of splicing. These solutions provide full time-dependent joint distributions for an arbitrary number of species with general noise behaviors and transient phenomena, offering qualitative and quantitative insights about how splicing can regulate expression dynamics. Finally, we derive a set of quantitative constraints on the minimum complexity necessary to reproduce gene coexpression patterns using synchronized burst models. We validate these findings by analyzing long-read sequencing data, where we find evidence of expression patterns largely consistent with these constraints.

Asunto(s)

Modelos Genéticos , Proteínas , Regiones Promotoras Genéticas , Procesos Estocásticos

17.

A discriminative learning approach to differential expression analysis for single-cell RNA-seq.

Ntranos, Vasilis; Yi, Lynn; Melsted, Páll; Pachter, Lior.

Nat Methods ; 16(2): 163-166, 2019 02.

Artículo en Inglés | MEDLINE | ID: mdl-30664774

RESUMEN

Single-cell RNA-seq makes it possible to characterize the transcriptomes of cell types across different conditions and to identify their transcriptional signatures via differential analysis. Our method detects changes in transcript dynamics and in overall gene abundance in large numbers of cells to determine differential expression. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3' single-cell RNA-seq that can identify previously undetectable marker genes.

Asunto(s)

Análisis de Secuencia de ARN , Análisis de la Célula Individual/instrumentación , Análisis de la Célula Individual/métodos , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Marcadores Genéticos , Humanos , Leucocitos Mononucleares/citología , Isoformas de Proteínas , ARN/genética , Análisis de Regresión , Programas Informáticos , Linfocitos T Citotóxicos/citología , Transcriptoma

18.

Interpretable factor models of single-cell RNA-seq via variational autoencoders.

Svensson, Valentine; Gayoso, Adam; Yosef, Nir; Pachter, Lior.

Bioinformatics ; 36(11): 3418-3421, 2020 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-32176273

RESUMEN

MOTIVATION: Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. RESULTS: We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications. AVAILABILITY AND IMPLEMENTATION: The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/. CONTACT: v@nxn.se. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

RNA-Seq , Análisis de la Célula Individual , Teorema de Bayes , Análisis de Secuencia de ARN , Programas Informáticos , Secuenciación del Exoma

19.

Expression reflects population structure.

Brown, Brielin C; Bray, Nicolas L; Pachter, Lior.

PLoS Genet ; 14(12): e1007841, 2018 12.

Artículo en Inglés | MEDLINE | ID: mdl-30566439

RESUMEN

Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.

Asunto(s)

Expresión Génica , Genética de Población , Femenino , Frecuencia de los Genes , Variación Genética , Genotipo , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ARN , Secuenciación Completa del Genoma

20.

Publisher Correction: Museum of spatial transcriptomics.

Moses, Lambda; Pachter, Lior.

Nat Methods ; 19(5): 628, 2022 May.

Artículo en Inglés | MEDLINE | ID: mdl-35440782

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA