Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 164
Filtrar
1.
Cell ; 179(3): 713-728.e17, 2019 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-31626771

RESUMO

The ventrolateral subdivision of the ventromedial hypothalamus (VMHvl) contains ∼4,000 neurons that project to multiple targets and control innate social behaviors including aggression and mounting. However, the number of cell types in VMHvl and their relationship to connectivity and behavioral function are unknown. We performed single-cell RNA sequencing using two independent platforms-SMART-seq (∼4,500 neurons) and 10x (∼78,000 neurons)-and investigated correspondence between transcriptomic identity and axonal projections or behavioral activation, respectively. Canonical correlation analysis (CCA) identified 17 transcriptomic types (T-types), including several sexually dimorphic clusters, the majority of which were validated by seqFISH. Immediate early gene analysis identified T-types exhibiting preferential responses to intruder males versus females but only rare examples of behavior-specific activation. Unexpectedly, many VMHvl T-types comprise a mixed population of neurons with different projection target preferences. Overall our analysis revealed that, surprisingly, few VMHvl T-types exhibit a clear correspondence with behavior-specific activation and connectivity.


Assuntos
Hipotálamo/citologia , Neurônios/classificação , Comportamento Social , Animais , Receptor alfa de Estrogênio/genética , Receptor alfa de Estrogênio/metabolismo , Feminino , Hipotálamo/fisiologia , Masculino , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Neurônios/metabolismo , Neurônios/fisiologia , Comportamento Sexual Animal , Análise de Célula Única , Transcriptoma
2.
Nature ; 598(7879): 195-199, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34616073

RESUMO

Full-length SMART-seq1 single-cell RNA sequencing can be used to measure gene expression at isoform resolution, making possible the identification of specific isoform markers for different cell types. Used in conjunction with spatial RNA capture and gene-tagging methods, this enables the inference of spatially resolved isoform expression for different cell types. Here, in a comprehensive analysis of 6,160 mouse primary motor cortex cells assayed with SMART-seq, 280,327 cells assayed with MERFISH2 and 94,162 cells assayed with 10x Genomics sequencing3, we find examples of isoform specificity in cell types-including isoform shifts between cell types that are masked in gene-level analysis-as well as examples of transcriptional regulation. Additionally, we show that isoform specificity helps to refine cell types, and that a multi-platform analysis of single-cell transcriptomic data leveraging multiple measurements provides a comprehensive atlas of transcription in the mouse primary motor cortex that improves on the possibilities offered by any single technology.


Assuntos
Perfilação da Expressão Gênica , Hibridização in Situ Fluorescente , Córtex Motor/citologia , Neurônios/classificação , Análise de Célula Única , Transcriptoma , Animais , Atlas como Assunto , Feminino , Neurônios GABAérgicos/citologia , Neurônios GABAérgicos/metabolismo , Glutamatos/metabolismo , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Córtex Motor/anatomia & histologia , Neurônios/citologia , Neurônios/metabolismo , Especificidade de Órgãos , Análise de Sequência
3.
Mol Cell ; 72(1): 7-9, 2018 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-30290149

RESUMO

Applying a kinetic model of RNA transcription and splicing, La Manno et al. (2018) predict changes in mRNA levels of individual cells from single-cell RNA-seq data.


Assuntos
Splicing de RNA , RNA , Cinética , RNA Mensageiro
4.
Nat Methods ; 19(5): 534-546, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35273392

RESUMO

The function of many biological systems, such as embryos, liver lobules, intestinal villi, and tumors, depends on the spatial organization of their cells. In the past decade, high-throughput technologies have been developed to quantify gene expression in space, and computational methods have been developed that leverage spatial gene expression data to identify genes with spatial patterns and to delineate neighborhoods within tissues. To comprehensively document spatial gene expression technologies and data-analysis methods, we present a curated review of literature on spatial transcriptomics dating back to 1987, along with a thorough analysis of trends in the field, such as usage of experimental techniques, species, tissues studied, and computational approaches used. Our Review places current methods in a historical context, and we derive insights about the field that can guide current research strategies. A companion supplement offers a more detailed look at the technologies and methods analyzed: https://pachterlab.github.io/LP_2021/ .


Assuntos
Museus , Transcriptoma , Fígado
5.
Bioinformatics ; 40(6)2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38876979

RESUMO

MOTIVATION: Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. RESULTS: We present a tool called splitcode, that enables flexible and efficient parsing, interpreting, and editing of sequencing reads. This versatile tool facilitates simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays. AVAILABILITY AND IMPLEMENTATION: The splitcode program is available at http://github.com/pachterlab/splitcode.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Biblioteca Gênica
6.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38377393

RESUMO

MOTIVATION: Eukaryotic linear motifs (ELMs), or Short Linear Motifs, are protein interaction modules that play an essential role in cellular processes and signaling networks and are often involved in diseases like cancer. The ELM database is a collection of manually curated motif knowledge from scientific papers. It has become a crucial resource for investigating motif biology and recognizing candidate ELMs in novel amino acid sequences. Users can search amino acid sequences or UniProt Accessions on the ELM resource web interface. However, as with many web services, there are limitations in the swift processing of large-scale queries through the ELM web interface or API calls, and, therefore, integration into protein function analysis pipelines is limited. RESULTS: To allow swift, large-scale motif analyses on protein sequences using ELMs curated in the ELM database, we have extended the gget suite of Python and command line tools with a new module, gget elm, which does not rely on the ELM server for efficiently finding candidate ELMs in user-submitted amino acid sequences and UniProt Accessions. gget elm increases accessibility to the information stored in the ELM database and allows scalable searches for motif-mediated interaction sites in the amino acid sequences. AVAILABILITY AND IMPLEMENTATION: The manual and source code are available at https://github.com/pachterlab/gget.


Assuntos
Proteínas , Software , Motivos de Aminoácidos , Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos
7.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38579259

RESUMO

MOTIVATION: Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. RESULTS: We present seqspec, a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. AVAILABILITY AND IMPLEMENTATION: The specification and associated seqspec command line tool is available at https://www.doi.org/10.5281/zenodo.10213865.


Assuntos
Genômica , Software
8.
Biophys J ; 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38715358

RESUMO

The advent of high-throughput transcriptomics provides an opportunity to advance mechanistic understanding of transcriptional processes and their connections to cellular function at an unprecedented, genome-wide scale. These transcriptional systems, which involve discrete stochastic events, are naturally modeled using chemical master equations (CMEs), which can be solved for probability distributions to fit biophysical rates that govern system dynamics. While CME models have been used as standards in fluorescence transcriptomics for decades to analyze single-species RNA distributions, there are often no closed-form solutions to CMEs that model multiple species, such as nascent and mature RNA transcript counts. This has prevented the application of standard likelihood-based statistical methods for analyzing high-throughput, multi-species transcriptomic datasets using biophysical models. Inspired by recent work in machine learning to learn solutions to complex dynamical systems, we leverage neural networks and statistical understanding of system distributions to produce accurate approximations to a steady-state bivariate distribution for a model of the RNA life cycle that includes nascent and mature molecules. The steady-state distribution to this simple model has no closed-form solution and requires intensive numerical solving techniques: our approach reduces likelihood evaluation time by several orders of magnitude. We demonstrate two approaches, whereby solutions are approximated by 1) learning the weights of kernel distributions with constrained parameters or 2) learning both weights and scaling factors for parameters of kernel distributions. We show that our strategies, denoted by kernel weight regression and parameter-scaled kernel weight regression, respectively, enable broad exploration of parameter space and can be used in existing likelihood frameworks to infer transcriptional burst sizes, RNA splicing rates, and mRNA degradation rates from experimental transcriptomic data.

9.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36610989

RESUMO

MOTIVATION: A recurring challenge in interpreting genomic data is the assessment of results in the context of existing reference databases. With the increasing number of command line and Python users, there is a need for tools implementing automated, easy programmatic access to curated reference information stored in a diverse collection of large, public genomic databases. RESULTS: gget is a free and open-source command line tool and Python package that enables efficient querying of genomic reference databases, such as Ensembl. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying required for genomic data analysis in a single line of code. AVAILABILITY AND IMPLEMENTATION: The manual and source code are available at https://github.com/pachterlab/gget. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Genômica/métodos , Genoma , Bases de Dados Factuais , Análise de Dados
10.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36610997

RESUMO

MOTIVATION: Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction. RESULTS: We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access. AVAILABILITY AND IMPLEMENTATION: ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.


Assuntos
Metadados , Software , Bases de Dados de Ácidos Nucleicos
11.
PLoS Comput Biol ; 19(8): e1011288, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37590228

RESUMO

Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.


Assuntos
Análise de Dados , Genômica , Humanos
12.
PLoS Comput Biol ; 18(9): e1010492, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36094956

RESUMO

We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems.


Assuntos
RNA , RNA/genética , Fluxo de Trabalho
13.
Bull Math Biol ; 85(11): 114, 2023 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-37828255

RESUMO

The serial nature of reactions involved in the RNA life-cycle motivates the incorporation of delays in models of transcriptional dynamics. The models couple a transcriptional process to a fairly general set of delayed monomolecular reactions with no feedback. We provide numerical strategies for calculating the RNA copy number distributions induced by these models, and solve several systems with splicing, degradation, and catalysis. An analysis of single-cell and single-nucleus RNA sequencing data using these models reveals that the kinetics of nuclear export do not appear to require invocation of a non-Markovian waiting time.


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Processos Estocásticos , Simulação por Computador , RNA , Cadeias de Markov , Algoritmos
14.
Nucleic Acids Res ; 49(20): e117, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34417615

RESUMO

Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Análise de Sequência de DNA/métodos , Software , Bactérias , Humanos , Funções Verossimilhança
15.
Biophys J ; 121(6): 1056-1069, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35143775

RESUMO

Splicing cascades that alter gene products posttranscriptionally also affect expression dynamics. We study a class of processes and associated distributions that emerge from models of bursty promoters coupled to directed acyclic graphs of splicing. These solutions provide full time-dependent joint distributions for an arbitrary number of species with general noise behaviors and transient phenomena, offering qualitative and quantitative insights about how splicing can regulate expression dynamics. Finally, we derive a set of quantitative constraints on the minimum complexity necessary to reproduce gene coexpression patterns using synchronized burst models. We validate these findings by analyzing long-read sequencing data, where we find evidence of expression patterns largely consistent with these constraints.


Assuntos
Modelos Genéticos , Proteínas , Regiões Promotoras Genéticas , Processos Estocásticos
16.
Nat Methods ; 16(2): 163-166, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30664774

RESUMO

Single-cell RNA-seq makes it possible to characterize the transcriptomes of cell types across different conditions and to identify their transcriptional signatures via differential analysis. Our method detects changes in transcript dynamics and in overall gene abundance in large numbers of cells to determine differential expression. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3' single-cell RNA-seq that can identify previously undetectable marker genes.


Assuntos
Análise de Sequência de RNA , Análise de Célula Única/instrumentação , Análise de Célula Única/métodos , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Marcadores Genéticos , Humanos , Leucócitos Mononucleares/citologia , Isoformas de Proteínas , RNA/genética , Análise de Regressão , Software , Linfócitos T Citotóxicos/citologia , Transcriptoma
17.
Bioinformatics ; 36(11): 3418-3421, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32176273

RESUMO

MOTIVATION: Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. RESULTS: We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications. AVAILABILITY AND IMPLEMENTATION: The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/. CONTACT: v@nxn.se. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA-Seq , Análise de Célula Única , Teorema de Bayes , Análise de Sequência de RNA , Software , Sequenciamento do Exoma
18.
PLoS Genet ; 14(12): e1007841, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30566439

RESUMO

Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.


Assuntos
Expressão Gênica , Genética Populacional , Feminino , Frequência do Gene , Variação Genética , Genótipo , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Locos de Características Quantitativas , Análise de Sequência de RNA , Sequenciamento Completo do Genoma
19.
Nat Methods ; 19(5): 628, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35440782
20.
Nat Methods ; 14(7): 687-690, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28581496

RESUMO

We describe sleuth (http://pachterlab.github.io/sleuth), a method for the differential analysis of gene expression data that utilizes bootstrapping in conjunction with response error linear modeling to decouple biological variance from inferential variance. sleuth is implemented in an interactive shiny app that utilizes kallisto quantifications and bootstraps for fast and accurate analysis of data from RNA-seq experiments.


Assuntos
Simulação por Computador , Expressão Gênica/fisiologia , RNA/genética , Software , Sequência de Bases , Modelos Biológicos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA