RESUMEN
BACKGROUND: The results of high-throughput biology ('omic') experiments provide insight into biological mechanisms but can be challenging to explore, archive and share. The scale of these challenges continues to grow as omic research volume expands and multiple analytical technologies, bioinformatic pipelines, and visualization preferences have emerged. Multiple software applications exist that support omic study exploration and/or archival. However, an opportunity remains for open-source software that can archive and present the results of omic analyses with broad accommodation of study-specific analytical approaches and visualizations with useful exploration features. RESULTS: We present OmicNavigator, an R package for the archival, visualization and interactive exploration of omic studies. OmicNavigator enables bioinformaticians to create web applications that interactively display their custom visualizations and analysis results linked with app-derived analytical tools, graphics, and tables. Studies created with OmicNavigator can be viewed within an interactive R session or hosted on a server for shared access. CONCLUSIONS: OmicNavigator can be found at https://github.com/abbvie-external/OmicNavigator.
Asunto(s)
Biología Computacional , Programas Informáticos , Biología Computacional/métodos , Interfaz Usuario-Computador , Gráficos por ComputadorRESUMEN
Cellular heterogeneity in gene expression is driven by cellular processes, such as cell cycle and cell-type identity, and cellular environment such as spatial location. The cell cycle, in particular, is thought to be a key driver of cell-to-cell heterogeneity in gene expression, even in otherwise homogeneous cell populations. Recent advances in single-cell RNA-sequencing (scRNA-seq) facilitate detailed characterization of gene expression heterogeneity and can thus shed new light on the processes driving heterogeneity. Here, we combined fluorescence imaging with scRNA-seq to measure cell cycle phase and gene expression levels in human induced pluripotent stem cells (iPSCs). By using these data, we developed a novel approach to characterize cell cycle progression. Although standard methods assign cells to discrete cell cycle stages, our method goes beyond this and quantifies cell cycle progression on a continuum. We found that, on average, scRNA-seq data from only five genes predicted a cell's position on the cell cycle continuum to within 14% of the entire cycle and that using more genes did not improve this accuracy. Our data and predictor of cell cycle phase can directly help future studies to account for cell cycle-related heterogeneity in iPSCs. Our results and methods also provide a foundation for future work to characterize the effects of the cell cycle on expression heterogeneity in other cell types.
Asunto(s)
Ciclo Celular/genética , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN , Análisis de la Célula Individual/métodos , Línea Celular , Perfilación de la Expresión Génica , Genes Reporteros , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Células Madre Pluripotentes Inducidas/metabolismo , Análisis de Secuencia de ARN/métodosRESUMEN
Quantification of gene expression levels at the single cell level has revealed that gene expression can vary substantially even across a population of homogeneous cells. However, it is currently unclear what genomic features control variation in gene expression levels, and whether common genetic variants may impact gene expression variation. Here, we take a genome-wide approach to identify expression variance quantitative trait loci (vQTLs). To this end, we generated single cell RNA-seq (scRNA-seq) data from induced pluripotent stem cells (iPSCs) derived from 53 Yoruba individuals. We collected data for a median of 95 cells per individual and a total of 5,447 single cells, and identified 235 mean expression QTLs (eQTLs) at 10% FDR, of which 79% replicate in bulk RNA-seq data from the same individuals. We further identified 5 vQTLs at 10% FDR, but demonstrate that these can also be explained as effects on mean expression. Our study suggests that dispersion QTLs (dQTLs) which could alter the variance of expression independently of the mean can have larger fold changes, but explain less phenotypic variance than eQTLs. We estimate 4,015 individuals as a lower bound to achieve 80% power to detect the strongest dQTLs in iPSCs. These results will guide the design of future studies on understanding the genetic control of gene expression variance.
Asunto(s)
Células Madre Pluripotentes Inducidas/metabolismo , Sitios de Carácter Cuantitativo , Población Negra/genética , Línea Celular , Simulación por Computador , Perfilación de la Expresión Génica , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Nigeria , Fenotipo , Análisis de Secuencia de ARN , Análisis de la Célula IndividualRESUMEN
DNA methylation is an important epigenetic regulator of gene expression. Recent studies have revealed widespread associations between genetic variation and methylation levels. However, the mechanistic links between genetic variation and methylation remain unclear. To begin addressing this gap, we collected methylation data at â¼300,000 loci in lymphoblastoid cell lines (LCLs) from 64 HapMap Yoruba individuals, and genome-wide bisulfite sequence data in ten of these individuals. We identified (at an FDR of 10%) 13,915 cis methylation QTLs (meQTLs)-i.e., CpG sites in which changes in DNA methylation are associated with genetic variation at proximal loci. We found that meQTLs are frequently associated with changes in methylation at multiple CpGs across regions of up to 3 kb. Interestingly, meQTLs are also frequently associated with variation in other properties of gene regulation, including histone modifications, DNase I accessibility, chromatin accessibility, and expression levels of nearby genes. These observations suggest that genetic variants may lead to coordinated molecular changes in all of these regulatory phenotypes. One plausible driver of coordinated changes in different regulatory mechanisms is variation in transcription factor (TF) binding. Indeed, we found that SNPs that change predicted TF binding affinities are significantly enriched for associations with DNA methylation at nearby CpGs.
Asunto(s)
Metilación de ADN , Regulación de la Expresión Génica , Histonas/metabolismo , Sitios de Carácter Cuantitativo , Factores de Transcripción/metabolismo , Sitios de Unión , Línea Celular Transformada , Biología Computacional , Estudio de Asociación del Genoma Completo , Genómica/métodos , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Unión ProteicaRESUMEN
Making scientific analyses reproducible, well documented, and easily shareable is crucial to maximizing their impact and ensuring that others can build on them. However, accomplishing these goals is not easy, requiring careful attention to organization, workflow, and familiarity with tools that are not a regular part of every scientist's toolbox. We have developed an R package, workflowr, to help all scientists, regardless of background, overcome these challenges. Workflowr aims to instill a particular "workflow" - a sequence of steps to be repeated and integrated into research practice - that helps make projects more reproducible and accessible.This workflow integrates four key elements: (1) version control (via Git); (2) literate programming (via R Markdown); (3) automatic checks and safeguards that improve code reproducibility; and (4) sharing code and results via a browsable website. These features exploit powerful existing tools, whose mastery would take considerable study. However, the workflowr interface is simple enough that novice users can quickly enjoy its many benefits. By simply following the workflowr "workflow", R users can create projects whose results, figures, and development history are easily accessible on a static website - thereby conveniently shareable with collaborators by sending them a URL - and accompanied by source code and reproducibility safeguards. The workflowr R package is open source and available on CRAN, with full documentation and source code available at https://github.com/jdblischak/workflowr.
Asunto(s)
Difusión de la Información , Programas Informáticos , Flujo de Trabajo , Reproducibilidad de los ResultadosRESUMEN
Phosphorylation of proteins on serine, threonine, and tyrosine residues is a ubiquitous post-translational modification that plays a key part of essentially every cell signaling process. It is reasonable to assume that inter-individual variation in protein phosphorylation may underlie phenotypic differences, as has been observed for practically any other molecular regulatory phenotype. However, we do not know much about the extent of inter-individual variation in phosphorylation because it is quite challenging to perform a quantitative high throughput study to assess inter-individual variation in any post-translational modification. To test our ability to address this challenge with SILAC-based mass spectrometry, we quantified phosphorylation levels for three genotyped human cell lines within a nested experimental framework, and found that genetic background is the primary determinant of phosphoproteome variation. We uncovered multiple functional, biophysical, and genetic associations with germline driven phosphopeptide variation. Variants affecting protein levels or structure were among these associations, with the latter presenting, on average, a stronger effect. Interestingly, we found evidence that is consistent with a phosphopeptide variability buffering effect endowed from properties enriched within longer proteins. Because the small sample size in this 'pilot' study may limit the applicability of our genetic observations, we also undertook a thorough technical assessment of our experimental workflow to aid further efforts. Taken together, these results provide the foundation for future work to characterize inter-individual variation in post-translational modification levels and reveal novel insights into the nature of inter-individual variation in phosphorylation.
Asunto(s)
Variación Biológica Poblacional/genética , Fosfopéptidos/metabolismo , Fosfoproteínas/metabolismo , Procesamiento Proteico-Postraduccional/genética , Proteoma/metabolismo , Línea Celular Tumoral , Cromatografía Líquida de Alta Presión/métodos , Conjuntos de Datos como Asunto , Genotipo , Humanos , Fosforilación/genética , Polimorfismo de Nucleótido Simple , Proteómica/métodos , Espectrometría de Masas en Tándem/métodosRESUMEN
Anthracycline-induced cardiotoxicity (ACT) is a key limiting factor in setting optimal chemotherapy regimes, with almost half of patients expected to develop congestive heart failure given high doses. However, the genetic basis of sensitivity to anthracyclines remains unclear. We created a panel of iPSC-derived cardiomyocytes from 45 individuals and performed RNA-seq after 24 hr exposure to varying doxorubicin dosages. The transcriptomic response is substantial: the majority of genes are differentially expressed and over 6000 genes show evidence of differential splicing, the later driven by reduced splicing fidelity in the presence of doxorubicin. We show that inter-individual variation in transcriptional response is predictive of in vitro cell damage, which in turn is associated with in vivo ACT risk. We detect 447 response-expression quantitative trait loci (QTLs) and 42 response-splicing QTLs, which are enriched in lower ACT GWAS [Formula: see text]-values, supporting the in vivo relevance of our map of genetic regulation of cellular response to anthracyclines.
Asunto(s)
Antraciclinas/toxicidad , Cardiotoxicidad , Miocitos Cardíacos/efectos de los fármacos , Células Cultivadas , Doxorrubicina/toxicidad , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Humanos , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ARNRESUMEN
BACKGROUND: There is substantial interest in the evolutionary forces that shaped the regulatory framework in early human development. Progress in this area has been slow because it is difficult to obtain relevant biological samples. Induced pluripotent stem cells (iPSCs) may provide the ability to establish in vitro models of early human and non-human primate developmental stages. RESULTS: Using matched iPSC panels from humans and chimpanzees, we comparatively characterize gene regulatory changes through a four-day time course differentiation of iPSCs into primary streak, endoderm progenitors, and definitive endoderm. As might be expected, we find that differentiation stage is the major driver of variation in gene expression levels, followed by species. We identify thousands of differentially expressed genes between humans and chimpanzees in each differentiation stage. Yet, when we consider gene-specific dynamic regulatory trajectories throughout the time course, we find that at least 75% of genes, including nearly all known endoderm developmental markers, have similar trajectories in the two species. Interestingly, we observe a marked reduction of both intra- and inter-species variation in gene expression levels in primitive streak samples compared to the iPSCs, with a recovery of regulatory variation in endoderm progenitors. CONCLUSIONS: The reduction of variation in gene expression levels at a specific developmental stage, paired with overall high degree of conservation of temporal gene regulation, is consistent with the dynamics of a conserved developmental process.
Asunto(s)
Diferenciación Celular , Endodermo/citología , Animales , Teorema de Bayes , Diferenciación Celular/genética , Femenino , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Humanos , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/metabolismo , Masculino , Pan troglodytes , Línea Primitiva/metabolismo , Factores de TiempoRESUMEN
Single-cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single-cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.
Asunto(s)
ARN/metabolismo , Análisis de la Célula Individual , Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/metabolismo , Análisis de Componente Principal , ARN/química , ARN/aislamiento & purificación , Análisis de Secuencia de ARNRESUMEN
Tuberculosis (TB) is a deadly infectious disease, which kills millions of people every year. The causative pathogen, Mycobacterium tuberculosis (MTB), is estimated to have infected up to a third of the world's population; however, only approximately 10% of infected healthy individuals progress to active TB. Despite evidence for heritability, it is not currently possible to predict who may develop TB. To explore approaches to classify susceptibility to TB, we infected with MTB dendritic cells (DCs) from putatively resistant individuals diagnosed with latent TB, and from susceptible individuals that had recovered from active TB. We measured gene expression levels in infected and non-infected cells and found hundreds of differentially expressed genes between susceptible and resistant individuals in the non-infected cells. We further found that genetic polymorphisms nearby the differentially expressed genes between susceptible and resistant individuals are more likely to be associated with TB susceptibility in published GWAS data. Lastly, we trained a classifier based on the gene expression levels in the non-infected cells, and demonstrated reasonable performance on our data and an independent data set. Overall, our promising results from this small study suggest that training a classifier on a larger cohort may enable us to accurately predict TB susceptibility.
Asunto(s)
Células Dendríticas/microbiología , Perfilación de la Expresión Génica , Predisposición Genética a la Enfermedad/genética , Tuberculosis Latente/genética , Tuberculosis/genética , Francia , Humanos , Tuberculosis Latente/sangre , Tuberculosis Latente/microbiología , Masculino , Mycobacterium tuberculosis/fisiología , Tuberculosis/sangre , Tuberculosis/microbiologíaRESUMEN
The active form of vitamin D, 1,25-dihydroxyvitamin D3 (1,25D), plays an important immunomodulatory role, regulating transcription of genes in the innate and adaptive immune system. The present study examines patterns of transcriptome-wide response to 1,25D, and the bacterial lipopolysaccharide (LPS) in primary human monocytes, to elucidate pathways underlying the effects of 1,25D on the immune system. Monocytes obtained from healthy individuals of African-American and European-American ancestry were treated with 1,25D, LPS, or both, simultaneously. The addition of 1,25D during stimulation with LPS induced significant upregulation of genes in the antimicrobial and autophagy pathways, and downregulation of proinflammatory response genes compared to LPS treatment alone. A joint Bayesian analysis enabled clustering of genes into patterns of shared transcriptional response across treatments. The biological pathways enriched within these expression patterns highlighted several mechanisms through which 1,25D could exert its immunomodulatory role. Pathways such as mTOR signaling, EIF2 signaling, IL-8 signaling, and Tec Kinase signaling were enriched among genes with opposite transcriptional responses to 1,25D and LPS, respectively, highlighting the important roles of these pathways in mediating the immunomodulatory activity of 1,25D. Furthermore, a subset of genes with evidence of interethnic differences in transcriptional response was also identified, suggesting that in addition to the well-established interethnic variation in circulating levels of vitamin D, the intensity of transcriptional response to 1,25D and LPS also varies between ethnic groups. We propose that dysregulation of the pathways identified in this study could contribute to immune-mediated disease risk.
Asunto(s)
Regulación de la Expresión Génica/efectos de los fármacos , Lipopolisacáridos/farmacología , Monocitos/efectos de los fármacos , Monocitos/metabolismo , Transcripción Genética/efectos de los fármacos , Vitamina D/análogos & derivados , Teorema de Bayes , Sitios de Unión , Análisis por Conglomerados , Biología Computacional/métodos , Perfilación de la Expresión Génica , Humanos , Lipopolisacáridos/inmunología , Monocitos/inmunología , Motivos de Nucleótidos , Unión Proteica , Receptores de Calcitriol/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Transcriptoma , Vitamina D/farmacologíaRESUMEN
The innate immune system provides the first response to infection and is now recognized to be partially pathogen-specific. Mycobacterium tuberculosis (MTB) is able to subvert the innate immune response and survive inside macrophages. Curiously, only 5-10% of otherwise healthy individuals infected with MTB develop active tuberculosis (TB). We do not yet understand the genetic basis underlying this individual-specific susceptibility. Moreover, we still do not know which properties of the innate immune response are specific to MTB infection. To identify immune responses that are specific to MTB, we infected macrophages with eight different bacteria, including different MTB strains and related mycobacteria, and studied their transcriptional response. We identified a novel subset of genes whose regulation was affected specifically by infection with mycobacteria. This subset includes genes involved in phagosome maturation, superoxide production, response to vitamin D, macrophage chemotaxis, and sialic acid synthesis. We suggest that genetic variants that affect the function or regulation of these genes should be considered candidate loci for explaining TB susceptibility.