Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters








Publication year range
1.
Nat Protoc ; 2024 Oct 10.
Article in English | MEDLINE | ID: mdl-39390263

ABSTRACT

The term 'RNA-seq' refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, single cells or single nuclei. The kallisto, bustools and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data. Execution of this protocol requires basic familiarity with a command line environment. With this protocol, quantification of a moderately sized RNA-seq dataset can be completed within minutes.

2.
bioRxiv ; 2024 Jun 18.
Article in English | MEDLINE | ID: mdl-38948774

ABSTRACT

CRISPR screens are powerful tools to identify key genes that underlie biological processes. One important type of screen uses fluorescence activated cell sorting (FACS) to sort perturbed cells into bins based on the expression level of marker genes, followed by guide RNA (gRNA) sequencing. Analysis of these data presents several statistical challenges due to multiple factors including the discrete nature of the bins and typically small numbers of replicate experiments. To address these challenges, we developed a robust and powerful Bayesian random effects model and software package called Waterbear. Furthermore, we used Waterbear to explore how various experimental design parameters affect statistical power to establish principled guidelines for future screens. Finally, we experimentally validated our experimental design model findings that, when using Waterbear for analysis, high power is maintained even at low cell coverage and a high multiplicity of infection. We anticipate that Waterbear will be of broad utility for analyzing FACS-based CRISPR screens.

3.
bioRxiv ; 2024 Jul 18.
Article in English | MEDLINE | ID: mdl-39071407

ABSTRACT

Mutations in the kinase and juxtamembrane domains of the MET Receptor Tyrosine Kinase are responsible for oncogenesis in various cancers and can drive resistance to MET-directed treatments. Determining the most effective inhibitor for each mutational profile is a major challenge for MET-driven cancer treatment in precision medicine. Here, we used a deep mutational scan (DMS) of ~5,764 MET kinase domain variants to profile the growth of each mutation against a panel of 11 inhibitors that are reported to target the MET kinase domain. We identified common resistance sites across type I, type II, and type I ½ inhibitors, unveiled unique resistance and sensitizing mutations for each inhibitor, and validated non-cross-resistant sensitivities for type I and type II inhibitor pairs. We augment a protein language model with biophysical and chemical features to improve the predictive performance for inhibitor-treated datasets. Together, our study demonstrates a pooled experimental pipeline for identifying resistance mutations, provides a reference dictionary for mutations that are sensitized to specific therapies, and offers insights for future drug development.

4.
Genome Biol ; 25(1): 138, 2024 05 24.
Article in English | MEDLINE | ID: mdl-38789982

ABSTRACT

Deep mutational scanning (DMS) measures the effects of thousands of genetic variants in a protein simultaneously. The small sample size renders classical statistical methods ineffective. For example, p-values cannot be correctly calibrated when treating variants independently. We propose Rosace, a Bayesian framework for analyzing growth-based DMS data. Rosace leverages amino acid position information to increase power and control the false discovery rate by sharing information across parameters via shrinkage. We also developed Rosette for simulating the distributional properties of DMS. We show that Rosace is robust to the violation of model assumptions and is more powerful than existing tools.


Subject(s)
Bayes Theorem , Humans , Software , Mutation , DNA Mutational Analysis/methods
5.
PLoS Comput Biol ; 20(2): e1011857, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38346082

ABSTRACT

A core problem in genetics is molecular quantitative trait locus (QTL) mapping, in which genetic variants associated with changes in the molecular phenotypes are identified. One of the most-studied molecular QTL mapping problems is expression QTL (eQTL) mapping, in which the molecular phenotype is gene expression. It is common in eQTL mapping to compute gene expression by aggregating the expression levels of individual isoforms from the same gene and then performing linear regression between SNPs and this aggregated gene expression level. However, SNPs may regulate isoforms from the same gene in different directions due to alternative splicing, or only regulate the expression level of one isoform, causing this approach to lose power. Here, we examine a broader question: which genes have at least one isoform whose expression level is regulated by genetic variants? In this study, we propose and evaluate several approaches to answering this question, demonstrating that "isoform-aware" methods-those that account for the expression levels of individual isoforms-have substantially greater power to answer this question than standard "gene-level" eQTL mapping methods. We identify settings in which different approaches yield an inflated number of false discoveries or lose power. In particular, we show that calling an eGene if there is a significant association between a SNP and any isoform fails to control False Discovery Rate, even when applying standard False Discovery Rate correction. We show that similar trends are observed in real data from the GEUVADIS and GTEx studies, suggesting the possibility that similar effects are present in these consortia.


Subject(s)
Gene Expression Regulation , Quantitative Trait Loci , Chromosome Mapping/methods , Gene Expression Regulation/genetics , Quantitative Trait Loci/genetics , Phenotype , Protein Isoforms/genetics , Polymorphism, Single Nucleotide/genetics , Genome-Wide Association Study
6.
bioRxiv ; 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38045414

ABSTRACT

The term "RNA-seq" refers to a collection of assays based on sequencing experiments that involve quantifying RNA species from bulk tissue, from single cells, or from single nuclei. The kallisto, bustools, and kb-python programs are free, open-source software tools for performing this analysis that together can produce gene expression quantification from raw sequencing reads. The quantifications can be individualized for multiple cells, multiple samples, or both. Additionally, these tools allow gene expression values to be classified as originating from nascent RNA species or mature RNA species, making this workflow amenable to both cell-based and nucleus-based assays. This protocol describes in detail how to use kallisto and bustools in conjunction with a wrapper, kb-python, to preprocess RNA-seq data.

7.
Cell ; 186(2): 446-460.e19, 2023 01 19.
Article in English | MEDLINE | ID: mdl-36638795

ABSTRACT

Precise targeting of large transgenes to T cells using homology-directed repair has been transformative for adoptive cell therapies and T cell biology. Delivery of DNA templates via adeno-associated virus (AAV) has greatly improved knockin efficiencies, but the tropism of current AAV serotypes restricts their use to human T cells employed in immunodeficient mouse models. To enable targeted knockins in murine T cells, we evolved Ark313, a synthetic AAV that exhibits high transduction efficiency in murine T cells. We performed a genome-wide knockout screen and identified QA2 as an essential factor for Ark313 infection. We demonstrate that Ark313 can be used for nucleofection-free DNA delivery, CRISPR-Cas9-mediated knockouts, and targeted integration of large transgenes. Ark313 enables preclinical modeling of Trac-targeted CAR-T and transgenic TCR-T cells in immunocompetent models. Efficient gene targeting in murine T cells holds great potential for improved cell therapies and opens avenues in experimental T cell immunology.


Subject(s)
Dependovirus , Genetic Engineering , T-Lymphocytes , Animals , Mice , CRISPR-Cas Systems/genetics , Dependovirus/genetics , Gene Targeting , Genetic Engineering/methods
8.
Am J Hum Genet ; 109(7): 1286-1297, 2022 07 07.
Article in English | MEDLINE | ID: mdl-35716666

ABSTRACT

Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Cholesterol, LDL , Gene Expression , Humans , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics , White People/genetics
10.
Cell ; 180(2): 263-277.e20, 2020 01 23.
Article in English | MEDLINE | ID: mdl-31955845

ABSTRACT

Cytosine methylation of DNA is a widespread modification of DNA that plays numerous critical roles. In the yeast Cryptococcus neoformans, CG methylation occurs in transposon-rich repeats and requires the DNA methyltransferase Dnmt5. We show that Dnmt5 displays exquisite maintenance-type specificity in vitro and in vivo and utilizes similar in vivo cofactors as the metazoan maintenance methylase Dnmt1. Remarkably, phylogenetic and functional analysis revealed that the ancestral species lost the gene for a de novo methylase, DnmtX, between 50-150 mya. We examined how methylation has persisted since the ancient loss of DnmtX. Experimental and comparative studies reveal efficient replication of methylation patterns in C. neoformans, rare stochastic methylation loss and gain events, and the action of natural selection. We propose that an epigenome has been propagated for >50 million years through a process analogous to Darwinian evolution of the genome.


Subject(s)
Cryptococcus neoformans/genetics , DNA Methylation/genetics , Methyltransferases/genetics , Biological Evolution , Cryptococcus neoformans/metabolism , DNA/metabolism , DNA (Cytosine-5-)-Methyltransferase 1/genetics , DNA (Cytosine-5-)-Methyltransferases/genetics , DNA Methylation/physiology , DNA Modification Methylases/genetics , DNA Transposable Elements/genetics , Epigenomics/methods , Evolution, Molecular , Genome/genetics , Methyltransferases/metabolism , Phylogeny
11.
Nat Biotechnol ; 36(11): 1056-1058, 2018 12.
Article in English | MEDLINE | ID: mdl-30114007

ABSTRACT

We present an in silico approach to identifying neoepitopes derived from intron retention events in tumor transcriptomes. Using mass spectrometry immunopeptidome analysis, we show that retained intron neoepitopes are processed and presented on MHC I on the surface of cancer cell lines. RNA-derived neoepitopes should be considered for prospective personalized cancer vaccine development.


Subject(s)
Computer Simulation , Epitopes/genetics , Introns/genetics , Models, Genetic , Neoplasms/genetics , Cancer Vaccines/immunology , Cell Line, Tumor , Epitope Mapping , Epitopes/metabolism , Humans , Neoplasms/immunology , Neoplasms/therapy , RNA/genetics
12.
Genome Biol ; 19(1): 53, 2018 04 12.
Article in English | MEDLINE | ID: mdl-29650040

ABSTRACT

Compared to RNA-sequencing transcript differential analysis, gene-level differential expression analysis is more robust and experimentally actionable. However, the use of gene counts for statistical analysis can mask transcript-level dynamics. We demonstrate that 'analysis first, aggregation second,' where the p values derived from transcript analysis are aggregated to obtain gene-level results, increase sensitivity and accuracy. The method we propose can also be applied to transcript compatibility counts obtained from pseudoalignment of reads, which circumvents the need for quantification and is fast, accurate, and model-free. The method generalizes to various levels of biology and we showcase an application to gene ontologies.


Subject(s)
Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Animals , Gene Ontology , Mice
13.
Nat Methods ; 14(7): 687-690, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28581496

ABSTRACT

We describe sleuth (http://pachterlab.github.io/sleuth), a method for the differential analysis of gene expression data that utilizes bootstrapping in conjunction with response error linear modeling to decouple biological variance from inferential variance. sleuth is implemented in an interactive shiny app that utilizes kallisto quantifications and bootstraps for fast and accurate analysis of data from RNA-seq experiments.


Subject(s)
Computer Simulation , Gene Expression/physiology , RNA/genetics , Software , Base Sequence , Models, Biological
14.
PLoS One ; 12(4): e0175744, 2017.
Article in English | MEDLINE | ID: mdl-28448519

ABSTRACT

BACKGROUND: A recent study of the gene expression patterns of Zika virus (ZIKV) infected human neural progenitor cells (hNPCs) revealed transcriptional dysregulation and identified cell cycle-related pathways that are affected by infection. However deeper exploration of the information present in the RNA-Seq data can be used to further elucidate the manner in which Zika infection of hNPCs affects the transcriptome, refining pathway predictions and revealing isoform-specific dynamics. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed data published by Tang et al. using state-of-the-art tools for transcriptome analysis. By accounting for the experimental design and estimation of technical and inferential variance we were able to pinpoint Zika infection affected pathways that highlight Zika's neural tropism. The examination of differential genes reveals cases of isoform divergence. CONCLUSIONS: Transcriptome analysis of Zika infected hNPCs has the potential to identify the molecular signatures of Zika infected neural cells. These signatures may be useful for diagnostics and for the resolution of infection pathways that can be used to harvest specific targets for further study.


Subject(s)
Neural Stem Cells/metabolism , Transcriptome , Zika Virus/pathogenicity , Down-Regulation , Gene Expression Profiling , Humans , Neural Stem Cells/virology , Principal Component Analysis , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA/chemistry , RNA/isolation & purification , RNA/metabolism , Sequence Analysis, RNA , Signal Transduction , Up-Regulation , alpha7 Nicotinic Acetylcholine Receptor/genetics , alpha7 Nicotinic Acetylcholine Receptor/metabolism
15.
BMC Bioinformatics ; 17(1): 490, 2016 Dec 01.
Article in English | MEDLINE | ID: mdl-27905880

ABSTRACT

Increased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is difficult to use in practice. We introduce a series of tools for processing and analyzing RNA-Seq data in the Sequence Read Archive, that together have allowed us to build an easily extendable resource for analysis of data underlying published papers. Our system makes the exploration of data easily accessible and usable without technical expertise. Our database and associated tools can be accessed at The Lair: http://pachterlab.github.io/lair .


Subject(s)
Databases, Nucleic Acid , Sequence Analysis, RNA/methods , Software , High-Throughput Nucleotide Sequencing/methods , Humans , Reproducibility of Results
17.
Nat Biotechnol ; 34(5): 525-7, 2016 05.
Article in English | MEDLINE | ID: mdl-27043002

ABSTRACT

We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing/methods , Models, Statistical , RNA/genetics , Sequence Analysis, RNA/methods , Software , Computer Simulation , Data Interpretation, Statistical , Pattern Recognition, Automated/methods , Reproducibility of Results , Sensitivity and Specificity , Sequence Alignment/methods
18.
Nucleic Acids Res ; 44(2): 838-51, 2016 Jan 29.
Article in English | MEDLINE | ID: mdl-26531823

ABSTRACT

Differentiating erythroblasts execute a dynamic alternative splicing program shown here to include extensive and diverse intron retention (IR) events. Cluster analysis revealed hundreds of developmentally-dynamic introns that exhibit increased IR in mature erythroblasts, and are enriched in functions related to RNA processing such as SF3B1 spliceosomal factor. Distinct, developmentally-stable IR clusters are enriched in metal-ion binding functions and include mitoferrin genes SLC25A37 and SLC25A28 that are critical for iron homeostasis. Some IR transcripts are abundant, e.g. comprising ∼50% of highly-expressed SLC25A37 and SF3B1 transcripts in late erythroblasts, and thereby limiting functional mRNA levels. IR transcripts tested were predominantly nuclear-localized. Splice site strength correlated with IR among stable but not dynamic intron clusters, indicating distinct regulation of dynamically-increased IR in late erythroblasts. Retained introns were preferentially associated with alternative exons with premature termination codons (PTCs). High IR was observed in disease-causing genes including SF3B1 and the RNA binding protein FUS. Comparative studies demonstrated that the intron retention program in erythroblasts shares features with other tissues but ultimately is unique to erythropoiesis. We conclude that IR is a multi-dimensional set of processes that post-transcriptionally regulate diverse gene groups during normal erythropoiesis, misregulation of which could be responsible for human disease.


Subject(s)
Erythroblasts/physiology , Erythropoiesis/genetics , Gene Expression Regulation , Introns , Cation Transport Proteins/genetics , Cell Differentiation/genetics , Cell Nucleus/genetics , Cells, Cultured , Cluster Analysis , Codon, Nonsense , Erythroblasts/cytology , Exons , Humans , Introns/genetics , Microfilament Proteins/genetics , Mitochondrial Proteins/genetics , Nonsense Mediated mRNA Decay , Phosphoproteins/genetics , RNA Splice Sites , RNA Splicing Factors , Ribonucleoprotein, U2 Small Nuclear/genetics , Spectrin/genetics
19.
Nucleic Acids Res ; 42(6): 4031-42, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24442673

ABSTRACT

Alternative pre-messenger RNA splicing remodels the human transcriptome in a spatiotemporal manner during normal development and differentiation. Here we explored the landscape of transcript diversity in the erythroid lineage by RNA-seq analysis of five highly purified populations of morphologically distinct human erythroblasts, representing the last four cell divisions before enucleation. In this unique differentiation system, we found evidence of an extensive and dynamic alternative splicing program encompassing genes with many diverse functions. Alternative splicing was particularly enriched in genes controlling cell cycle, organelle organization, chromatin function and RNA processing. Many alternative exons exhibited differentiation-associated switches in splicing efficiency, mostly in late-stage polychromatophilic and orthochromatophilic erythroblasts, in concert with extensive cellular remodeling that precedes enucleation. A subset of alternative splicing switches introduces premature translation termination codons into selected transcripts in a differentiation stage-specific manner, supporting the hypothesis that alternative splicing-coupled nonsense-mediated decay contributes to regulation of erythroid-expressed genes as a novel part of the overall differentiation program. We conclude that a highly dynamic alternative splicing program in terminally differentiating erythroblasts plays a major role in regulating gene expression to ensure synthesis of appropriate proteome at each stage as the cells remodel in preparation for production of mature red cells.


Subject(s)
Alternative Splicing , Erythropoiesis/genetics , Cells, Cultured , Erythroblasts/metabolism , Erythroid Cells/cytology , Erythroid Cells/metabolism , Humans , Nonsense Mediated mRNA Decay , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Transcriptome
20.
Genome Biol ; 14(4): R36, 2013 Apr 25.
Article in English | MEDLINE | ID: mdl-23618408

ABSTRACT

TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.


Subject(s)
Gene Duplication , Gene Fusion , Mutagenesis, Insertional , Sequence Alignment/methods , Software , Humans , Sensitivity and Specificity , Sequence Analysis, RNA/methods , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL