Pesquisa | Secretaria de Estado da Saúde

1.

Nonnegative spatial factorization applied to spatial genomics.

Townes, F William; Engelhardt, Barbara E.

Nat Methods ; 20(2): 229-238, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36587187

RESUMO

Nonnegative matrix factorization (NMF) is widely used to analyze high-dimensional count data because, in contrast to real-valued alternatives such as factor analysis, it produces an interpretable parts-based representation. However, in applications such as spatial transcriptomics, NMF fails to incorporate known structure between observations. Here, we present nonnegative spatial factorization (NSF), a spatially-aware probabilistic dimension reduction model based on transformed Gaussian processes that naturally encourages sparsity and scales to tens of thousands of observations. NSF recovers ground truth factors more accurately than real-valued alternatives such as MEFISTO in simulations, and has lower out-of-sample prediction error than probabilistic NMF on three spatial transcriptomics datasets from mouse brain and liver. Since not all patterns of gene expression have spatial correlations, we also propose a hybrid extension of NSF that combines spatial and nonspatial components, enabling quantification of spatial importance for both observations and features. A TensorFlow implementation of NSF is available from https://github.com/willtownes/nsf-paper .

Assuntos

Algoritmos , Perfilação da Expressão Gênica , Animais , Camundongos , Perfilação da Expressão Gênica/métodos , Genômica , Modelos Estatísticos

2.

Alignment of spatial genomics data using deep Gaussian processes.

Jones, Andrew; Townes, F William; Li, Didong; Engelhardt, Barbara E.

Nat Methods ; 20(9): 1379-1387, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37592182

RESUMO

Spatially resolved genomic technologies have allowed us to study the physical organization of cells and tissues, and promise an understanding of local interactions between cells. However, it remains difficult to precisely align spatial observations across slices, samples, scales, individuals and technologies. Here, we propose a probabilistic model that aligns spatially-resolved samples onto a known or unknown common coordinate system (CCS) with respect to phenotypic readouts (for example, gene expression). Our method, Gaussian Process Spatial Alignment (GPSA), consists of a two-layer Gaussian process: the first layer maps observed samples' spatial locations onto a CCS, and the second layer maps from the CCS to the observed readouts. Our approach enables complex downstream spatially aware analyses that are impossible or inaccurate with unaligned data, including an analysis of variance, creation of a dense three-dimensional (3D) atlas from sparse two-dimensional (2D) slices or association tests across data modalities.

Assuntos

Genômica , Modelos Estatísticos , Humanos , Distribuição Normal

3.

A self-exciting point process to study multicellular spatial signaling patterns.

Verma, Archit; Jena, Siddhartha G; Isakov, Danielle R; Aoki, Kazuhiro; Toettcher, Jared E; Engelhardt, Barbara E.

Proc Natl Acad Sci U S A ; 118(32)2021 08 10.

Artigo em Inglês | MEDLINE | ID: mdl-34362843

RESUMO

Multicellular organisms rely on spatial signaling among cells to drive their organization, development, and response to stimuli. Several models have been proposed to capture the behavior of spatial signaling in multicellular systems, but existing approaches fail to capture both the autonomous behavior of single cells and the interactions of a cell with its neighbors simultaneously. We propose a spatiotemporal model of dynamic cell signaling based on Hawkes processes-self-exciting point processes-that model the signaling processes within a cell and spatial couplings between cells. With this cellular point process (CPP), we capture both the single-cell pathway activation rate and the magnitude and duration of signaling between cells relative to their spatial location. Furthermore, our model captures tissues composed of heterogeneous cell types with different bursting rates and signaling behaviors across multiple signaling proteins. We apply our model to epithelial cell systems that exhibit a range of autonomous and spatial signaling behaviors basally and under pharmacological exposure. Our model identifies known drug-induced signaling deficits, characterizes signaling changes across a wound front, and generalizes to multichannel observations.

Assuntos

Queratinócitos/metabolismo , Modelos Biológicos , Transdução de Sinais , Animais , Dipeptídeos/farmacologia , Cães , Células Epiteliais , Ácidos Hidroxâmicos/farmacologia , Queratinócitos/citologia , Queratinócitos/efeitos dos fármacos , Sistema de Sinalização das MAP Quinases/efeitos dos fármacos , Células Madin Darby de Rim Canino , Camundongos Endogâmicos , Camundongos Transgênicos , Modelos Estatísticos , Inibidores de Proteínas Quinases/farmacologia , Transdução de Sinais/efeitos dos fármacos , Análise Espaço-Temporal

4.

netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis.

Elyanow, Rebecca; Dumitrascu, Bianca; Engelhardt, Barbara E; Raphael, Benjamin J.

Genome Res ; 30(2): 195-204, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-31992614

RESUMO

Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.

Assuntos

Epistasia Genética/genética , RNA-Seq , Análise de Célula Única/métodos , Software , Análise por Conglomerados , Perfilação da Expressão Gênica , Humanos , Sequenciamento do Exoma

5.

Genetic effects on gene expression across human tissues.

Battle, Alexis; Brown, Christopher D; Engelhardt, Barbara E; Montgomery, Stephen B.

Nature ; 550(7675): 204-213, 2017 10 11.

Artigo em Inglês | MEDLINE | ID: mdl-29022597

RESUMO

Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

Assuntos

Perfilação da Expressão Gênica , Regulação da Expressão Gênica/genética , Variação Genética , Especificidade de Órgãos/genética , Alelos , Cromossomos Humanos/genética , Doença/genética , Feminino , Genoma Humano/genética , Genótipo , Humanos , Masculino , Locos de Características Quantitativas/genética

6.

Towards 'end-to-end' analysis and understanding of biological timecourse data.

Jena, Siddhartha G; Goglia, Alexander G; Engelhardt, Barbara E.

Biochem J ; 479(11): 1257-1263, 2022 06 17.

Artigo em Inglês | MEDLINE | ID: mdl-35713413

RESUMO

Petabytes of increasingly complex and multidimensional live cell and tissue imaging data are generated every year. These videos hold large promise for understanding biology at a deep and fundamental level, as they capture single-cell and multicellular events occurring over time and space. However, the current modalities for analysis and mining of these data are scattered and user-specific, preventing more unified analyses from being performed over different datasets and obscuring possible scientific insights. Here, we propose a unified pipeline for storage, segmentation, analysis, and statistical parametrization of live cell imaging datasets.

Assuntos

Conjuntos de Dados como Assunto

7.

A Poisson reduced-rank regression model for association mapping in sequencing data.

Fitzgerald, Tiana; Jones, Andrew; Engelhardt, Barbara E.

BMC Bioinformatics ; 23(1): 529, 2022 Dec 08.

Artigo em Inglês | MEDLINE | ID: mdl-36482321

RESUMO

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. RESULTS: We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. CONCLUSION: We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.

8.

Causal network inference from gene transcriptional time-series response to glucocorticoids.

Lu, Jonathan; Dumitrascu, Bianca; McDowell, Ian C; Jo, Brian; Barrera, Alejandro; Hong, Linda K; Leichter, Sarah M; Reddy, Timothy E; Engelhardt, Barbara E.

PLoS Comput Biol ; 17(1): e1008223, 2021 01.

Artigo em Inglês | MEDLINE | ID: mdl-33513136

RESUMO

Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.

Assuntos

Glucocorticoides/farmacologia , Modelos Estatísticos , Transcriptoma/efeitos dos fármacos , Células A549 , Algoritmos , Biologia Computacional , Humanos , Pulmão/química , Pulmão/metabolismo , Aprendizado de Máquina , Software , Transcriptoma/genética

9.

Brain kernel: A new spatial covariance function for fMRI data.

Wu, Anqi; Nastase, Samuel A; Baldassano, Christopher A; Turk-Browne, Nicholas B; Norman, Kenneth A; Engelhardt, Barbara E; Pillow, Jonathan W.

Neuroimage ; 245: 118580, 2021 12 15.

Artigo em Inglês | MEDLINE | ID: mdl-34740792

RESUMO

A key problem in functional magnetic resonance imaging (fMRI) is to estimate spatial activity patterns from noisy high-dimensional signals. Spatial smoothing provides one approach to regularizing such estimates. However, standard smoothing methods ignore the fact that correlations in neural activity may fall off at different rates in different brain areas, or exhibit discontinuities across anatomical or functional boundaries. Moreover, such methods do not exploit the fact that widely separated brain regions may exhibit strong correlations due to bilateral symmetry or the network organization of brain regions. To capture this non-stationary spatial correlation structure, we introduce the brain kernel, a continuous covariance function for whole-brain activity patterns. We define the brain kernel in terms of a continuous nonlinear mapping from 3D brain coordinates to a latent embedding space, parametrized with a Gaussian process (GP). The brain kernel specifies the prior covariance between voxels as a function of the distance between their locations in embedding space. The GP mapping warps the brain nonlinearly so that highly correlated voxels are close together in latent space, and uncorrelated voxels are far apart. We estimate the brain kernel using resting-state fMRI data, and we develop an exact, scalable inference method based on block coordinate descent to overcome the challenges of high dimensionality (10-100K voxels). Finally, we illustrate the brain kernel's usefulness with applications to brain decoding and factor analysis with multiple task-based fMRI datasets.

Assuntos

Mapeamento Encefálico/métodos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Neuroimagem/métodos , Humanos , Imageamento Tridimensional

10.

Glucocorticoid receptor recruits to enhancers and drives activation by motif-directed binding.

McDowell, Ian C; Barrera, Alejandro; D'Ippolito, Anthony M; Vockley, Christopher M; Hong, Linda K; Leichter, Sarah M; Bartelt, Luke C; Majoros, William H; Song, Lingyun; Safi, Alexias; Koçak, D Dewran; Gersbach, Charles A; Hartemink, Alexander J; Crawford, Gregory E; Engelhardt, Barbara E; Reddy, Timothy E.

Genome Res ; 28(9): 1272-1284, 2018 09.

Artigo em Inglês | MEDLINE | ID: mdl-30097539

RESUMO

Glucocorticoids are potent steroid hormones that regulate immunity and metabolism by activating the transcription factor (TF) activity of glucocorticoid receptor (GR). Previous models have proposed that DNA binding motifs and sites of chromatin accessibility predetermine GR binding and activity. However, there are vast excesses of both features relative to the number of GR binding sites. Thus, these features alone are unlikely to account for the specificity of GR binding and activity. To identify genomic and epigenetic contributions to GR binding specificity and the downstream changes resultant from GR binding, we performed hundreds of genome-wide measurements of TF binding, epigenetic state, and gene expression across a 12-h time course of glucocorticoid exposure. We found that glucocorticoid treatment induces GR to bind to nearly all pre-established enhancers within minutes. However, GR binds to only a small fraction of the set of accessible sites that lack enhancer marks. Once GR is bound to enhancers, a combination of enhancer motif composition and interactions between enhancers then determines the strength and persistence of GR binding, which consequently correlates with dramatic shifts in enhancer activation. Over the course of several hours, highly coordinated changes in TF binding and histone modification occupancy occur specifically within enhancers, and these changes correlate with changes in the expression of nearby genes. Following GR binding, changes in the binding of other TFs precede changes in chromatin accessibility, suggesting that other TFs are also sensitive to genomic features beyond that of accessibility.

Assuntos

Elementos Facilitadores Genéticos , Código das Histonas , Motivos de Nucleotídeos , Receptores de Glucocorticoides/metabolismo , Ativação Transcricional , Linhagem Celular Tumoral , Epigênese Genética , Humanos , Ligação Proteica , Fatores de Transcrição/metabolismo

11.

A robust nonlinear low-dimensional manifold for single cell RNA-seq data.

Verma, Archit; Engelhardt, Barbara E.

BMC Bioinformatics ; 21(1): 324, 2020 Jul 21.

Artigo em Inglês | MEDLINE | ID: mdl-32693778

RESUMO

BACKGROUND: Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single-cell data. However, methods have yet to be developed for unfiltered and unnormalized count data that estimate uncertainty in the low-dimensional space. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. RESULTS: Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student's t-distribution to estimate a manifold that is robust to technical and biological noise found in normalized scRNA-seq data. We compare our approach to common dimension reduction tools across a diverse set of scRNA-seq data sets to highlight our model's ability to enable important downstream tasks such as clustering, inferring cell developmental trajectories, and visualizing high throughput experiments on available experimental data. CONCLUSION: We show that our adaptive robust statistical approach to estimate a nonlinear manifold is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.

Assuntos

Dinâmica não Linear , RNA-Seq , Análise de Célula Única/métodos , Células Sanguíneas/metabolismo , Regulação da Expressão Gênica , Humanos , Modelos Genéticos , Neurônios/metabolismo , Distribuição Normal , Análise de Componente Principal , Fatores de Tempo

12.

Detecting differential growth of microbial populations with Gaussian process regression.

Tonner, Peter D; Darnell, Cynthia L; Engelhardt, Barbara E; Schmid, Amy K.

Genome Res ; 27(2): 320-333, 2017 02.

Artigo em Inglês | MEDLINE | ID: mdl-27864351

RESUMO

Microbial growth curves are used to study differential effects of media, genetics, and stress on microbial population growth. Consequently, many modeling frameworks exist to capture microbial population growth measurements. However, current models are designed to quantify growth under conditions for which growth has a specific functional form. Extensions to these models are required to quantify the effects of perturbations, which often exhibit nonstandard growth curves. Rather than assume specific functional forms for experimental perturbations, we developed a general and robust model of microbial population growth curves using Gaussian process (GP) regression. GP regression modeling of high-resolution time-series growth data enables accurate quantification of population growth and allows explicit control of effects from other covariates such as genetic background. This framework substantially outperforms commonly used microbial population growth models, particularly when modeling growth data from environmentally stressed populations. We apply the GP growth model and develop statistical tests to quantify the differential effects of environmental perturbations on microbial growth across a large compendium of genotypes in archaea and yeast. This method accurately identifies known transcriptional regulators and implicates novel regulators of growth under standard and stress conditions in the model archaeal organism Halobacterium salinarum For yeast, our method correctly identifies known phenotypes for a diversity of genetic backgrounds under cyclohexamide stress and also detects previously unidentified oxidative stress sensitivity across a subset of strains. Together, these results demonstrate that the GP models are interpretable, recapitulating biological knowledge of growth response while providing new insights into the relevant parameters affecting microbial population growth.

Assuntos

Halobacterium salinarum/crescimento & desenvolvimento , Modelos Biológicos , Leveduras/crescimento & desenvolvimento , Halobacterium salinarum/genética , Distribuição Normal , Fenótipo , Leveduras/genética

13.

Co-expression networks reveal the tissue-specific regulation of transcription and splicing.

Saha, Ashis; Kim, Yungil; Gewirtz, Ariel D H; Jo, Brian; Gao, Chuan; McDowell, Ian C; Engelhardt, Barbara E; Battle, Alexis.

Genome Res ; 27(11): 1843-1858, 2017 11.

Artigo em Inglês | MEDLINE | ID: mdl-29021288

RESUMO

Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.

Assuntos

Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Splicing de RNA , Análise de Sequência de RNA/métodos , Teorema de Bayes , Bases de Dados Genéticas , Regulação da Expressão Gênica , Técnicas de Genotipagem , Humanos , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único

14.

Statistical tests for detecting variance effects in quantitative trait studies.

Dumitrascu, Bianca; Darnell, Gregory; Ayroles, Julien; Engelhardt, Barbara E.

Bioinformatics ; 35(2): 200-210, 2019 01 15.

Artigo em Inglês | MEDLINE | ID: mdl-29982387

RESUMO

Motivation: Identifying variants, both discrete and continuous, that are associated with quantitative traits, or QTs, is the primary focus of quantitative genetics. Most current methods are limited to identifying mean effects, or associations between genotype or covariates and the mean value of a quantitative trait. It is possible, however, that a variant may affect the variance of the quantitative trait in lieu of, or in addition to, affecting the trait mean. Here, we develop a general methodology to identify covariates with variance effects on a quantitative trait using a Bayesian heteroskedastic linear regression model (BTH). We compare BTH with existing methods to detect variance effects across a large range of simulations drawn from scenarios common to the analysis of quantitative traits. Results: We find that BTH and a double generalized linear model (dglm) outperform classical tests used for detecting variance effects in recent genomic studies. We show BTH and dglm are less likely to generate spurious discoveries through simulations and application to identifying methylation variance QTs and expression variance QTs. We identify four variance effects of sex in the Cardiovascular and Pharmacogenetics study. Our work is the first to offer a comprehensive view of variance identifying methodology. We identify shortcomings in previously used methodology and provide a more conservative and robust alternative. We extend variance effect analysis to a wide array of covariates that enables a new statistical dimension in the study of sex and age specific quantitative trait effects. Availability and implementation: https://github.com/b2du/bth. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Teorema de Bayes , Genômica/métodos , Modelos Lineares , Modelos Genéticos , Locos de Características Quantitativas , Análise de Variância , Biologia Computacional , Humanos , Fenótipo

15.

Sparse multi-output Gaussian processes for online medical time series prediction.

Cheng, Li-Fang; Dumitrascu, Bianca; Darnell, Gregory; Chivers, Corey; Draugelis, Michael; Li, Kai; Engelhardt, Barbara E.

BMC Med Inform Decis Mak ; 20(1): 152, 2020 07 08.

Artigo em Inglês | MEDLINE | ID: mdl-32641134

RESUMO

BACKGROUND: For real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring. METHODS: We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates. RESULTS: We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals. CONCLUSIONS: The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://github.com/bee-hive/MedGP .

Assuntos

Algoritmos , Modelos Estatísticos , Teorema de Bayes , Distribuição Normal

16.

Clustering gene expression time series data using an infinite Gaussian process mixture model.

McDowell, Ian C; Manandhar, Dinesh; Vockley, Christopher M; Schmid, Amy K; Reddy, Timothy E; Engelhardt, Barbara E.

PLoS Comput Biol ; 14(1): e1005896, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-29337990

RESUMO

Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

Assuntos

Análise por Conglomerados , Regulação Neoplásica da Expressão Gênica , Neoplasias Pulmonares/genética , Células A549 , Algoritmos , Linhagem Celular Tumoral , Biologia Computacional , Simulação por Computador , Dexametasona/química , Perfilação da Expressão Gênica , Glucocorticoides/química , Histonas/química , Humanos , Ligação de Hidrogênio , Peróxido de Hidrogênio/química , Neoplasias Pulmonares/tratamento farmacológico , Modelos Biológicos , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA , Fatores de Tempo , Fatores de Transcrição/química

17.

A statin-dependent QTL for GATM expression is associated with statin-induced myopathy.

Mangravite, Lara M; Engelhardt, Barbara E; Medina, Marisa W; Smith, Joshua D; Brown, Christopher D; Chasman, Daniel I; Mecham, Brigham H; Howie, Bryan; Shim, Heejung; Naidoo, Devesh; Feng, QiPing; Rieder, Mark J; Chen, Yii-Der I; Rotter, Jerome I; Ridker, Paul M; Hopewell, Jemma C; Parish, Sarah; Armitage, Jane; Collins, Rory; Wilke, Russell A; Nickerson, Deborah A; Stephens, Matthew; Krauss, Ronald M.

Nature ; 502(7471): 377-80, 2013 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-23995691

RESUMO

Statins are prescribed widely to lower plasma low-density lipoprotein (LDL) concentrations and cardiovascular disease risk and have been shown to have beneficial effects in a broad range of patients. However, statins are associated with an increased risk, albeit small, of clinical myopathy and type 2 diabetes. Despite evidence for substantial genetic influence on LDL concentrations, pharmacogenomic trials have failed to identify genetic variations with large effects on either statin efficacy or toxicity, and have produced little information regarding mechanisms that modulate statin response. Here we identify a downstream target of statin treatment by screening for the effects of in vitro statin exposure on genetic associations with gene expression levels in lymphoblastoid cell lines derived from 480 participants of a clinical trial of simvastatin treatment. This analysis identified six expression quantitative trait loci (eQTLs) that interacted with simvastatin exposure, including rs9806699, a cis-eQTL for the gene glycine amidinotransferase (GATM) that encodes the rate-limiting enzyme in creatine synthesis. We found this locus to be associated with incidence of statin-induced myotoxicity in two separate populations (meta-analysis odds ratio = 0.60). Furthermore, we found that GATM knockdown in hepatocyte-derived cell lines attenuated transcriptional response to sterol depletion, demonstrating that GATM may act as a functional link between statin-mediated lowering of cholesterol and susceptibility to statin-induced myopathy.

Assuntos

Amidinotransferases/genética , Regulação da Expressão Gênica/efeitos dos fármacos , Inibidores de Hidroximetilglutaril-CoA Redutases/efeitos adversos , Doenças Musculares/induzido quimicamente , Locos de Características Quantitativas/genética , Sinvastatina/efeitos adversos , Amidinotransferases/deficiência , Amidinotransferases/metabolismo , Linhagem Celular , Colesterol/deficiência , Colesterol/metabolismo , Colesterol/farmacologia , Técnicas de Silenciamento de Genes , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/farmacologia , Linfócitos/citologia , Linfócitos/efeitos dos fármacos , Linfócitos/metabolismo , Doenças Musculares/genética , Doenças Musculares/metabolismo , Polimorfismo de Nucleotídeo Único/genética , Sinvastatina/farmacologia , Proteínas de Ligação a Elemento Regulador de Esterol/metabolismo , Transcrição Gênica/efeitos dos fármacos

18.

Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure.

Mimno, David; Blei, David M; Engelhardt, Barbara E.

Proc Natl Acad Sci U S A ; 112(26): E3441-50, 2015 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-26071445

RESUMO

Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of fit of a statistical model to a specific dataset. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the admixture model fit to four qualitatively different population genetic datasets: the population reference sample (POPRES) European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies.

Assuntos

Modelos Teóricos , Teorema de Bayes , Variação Genética , Humanos , Desequilíbrio de Ligação , Incerteza

19.

Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering.

Gao, Chuan; McDowell, Ian C; Zhao, Shiwen; Brown, Christopher D; Engelhardt, Barbara E.

PLoS Comput Biol ; 12(7): e1004791, 2016 07.

Artigo em Inglês | MEDLINE | ID: mdl-27467526

RESUMO

Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx) pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues.

Assuntos

Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Teorema de Bayes , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Análise por Conglomerados , Feminino , Humanos , Masculino , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos

20.

Genetic variation associated with euphorigenic effects of d-amphetamine is associated with diminished risk for schizophrenia and attention deficit hyperactivity disorder.

Hart, Amy B; Gamazon, Eric R; Engelhardt, Barbara E; Sklar, Pamela; Kähler, Anna K; Hultman, Christina M; Sullivan, Patrick F; Neale, Benjamin M; Faraone, Stephen V; de Wit, Harriet; Cox, Nancy J; Palmer, Abraham A.

Proc Natl Acad Sci U S A ; 111(16): 5968-73, 2014 Apr 22.

Artigo em Inglês | MEDLINE | ID: mdl-24711425

RESUMO

Here, we extended our findings from a genome-wide association study of the euphoric response to d-amphetamine in healthy human volunteers by identifying enrichment between SNPs associated with response to d-amphetamine and SNPs associated with psychiatric disorders. We found that SNPs nominally associated (P ≤ 0.05 and P ≤ 0.01) with schizophrenia and attention deficit hyperactivity disorder were also nominally associated with d-amphetamine response. Furthermore, we found that the source of this enrichment was an excess of alleles that increased sensitivity to the euphoric effects of d-amphetamine and decreased susceptibility to schizophrenia and attention deficit hyperactivity disorder. In contrast, three negative control phenotypes (height, inflammatory bowel disease, and Parkinson disease) did not show this enrichment. Taken together, our results suggest that alleles identified using an acute challenge with a dopaminergic drug in healthy individuals can be used to identify alleles that confer risk for psychiatric disorders commonly treated with dopaminergic agonists and antagonists. More importantly, our results show the use of the enrichment approach as an alternative to stringent standards for genome-wide significance and suggest a relatively novel approach to the analysis of small cohorts in which intermediate phenotypes have been measured.

Assuntos

Transtorno do Deficit de Atenção com Hiperatividade/tratamento farmacológico , Transtorno do Deficit de Atenção com Hiperatividade/genética , Dextroanfetamina/uso terapêutico , Euforia , Variação Genética , Esquizofrenia/tratamento farmacológico , Esquizofrenia/genética , Transtorno Bipolar/tratamento farmacológico , Transtorno Bipolar/genética , Dextroanfetamina/farmacologia , Euforia/efeitos dos fármacos , Predisposição Genética para Doença , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Substâncias Protetoras/farmacologia , Substâncias Protetoras/uso terapêutico , Reprodutibilidade dos Testes , Fatores de Risco

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa