Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nat Methods ; 20(2): 229-238, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36587187

RESUMEN

Nonnegative matrix factorization (NMF) is widely used to analyze high-dimensional count data because, in contrast to real-valued alternatives such as factor analysis, it produces an interpretable parts-based representation. However, in applications such as spatial transcriptomics, NMF fails to incorporate known structure between observations. Here, we present nonnegative spatial factorization (NSF), a spatially-aware probabilistic dimension reduction model based on transformed Gaussian processes that naturally encourages sparsity and scales to tens of thousands of observations. NSF recovers ground truth factors more accurately than real-valued alternatives such as MEFISTO in simulations, and has lower out-of-sample prediction error than probabilistic NMF on three spatial transcriptomics datasets from mouse brain and liver. Since not all patterns of gene expression have spatial correlations, we also propose a hybrid extension of NSF that combines spatial and nonspatial components, enabling quantification of spatial importance for both observations and features. A TensorFlow implementation of NSF is available from https://github.com/willtownes/nsf-paper .


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Animales , Ratones , Perfilación de la Expresión Génica/métodos , Genómica , Modelos Estadísticos
2.
Nat Methods ; 20(9): 1379-1387, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37592182

RESUMEN

Spatially resolved genomic technologies have allowed us to study the physical organization of cells and tissues, and promise an understanding of local interactions between cells. However, it remains difficult to precisely align spatial observations across slices, samples, scales, individuals and technologies. Here, we propose a probabilistic model that aligns spatially-resolved samples onto a known or unknown common coordinate system (CCS) with respect to phenotypic readouts (for example, gene expression). Our method, Gaussian Process Spatial Alignment (GPSA), consists of a two-layer Gaussian process: the first layer maps observed samples' spatial locations onto a CCS, and the second layer maps from the CCS to the observed readouts. Our approach enables complex downstream spatially aware analyses that are impossible or inaccurate with unaligned data, including an analysis of variance, creation of a dense three-dimensional (3D) atlas from sparse two-dimensional (2D) slices or association tests across data modalities.


Asunto(s)
Genómica , Modelos Estadísticos , Humanos , Distribución Normal
3.
BMC Bioinformatics ; 25(1): 291, 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39232666

RESUMEN

Genomics methods have uncovered patterns in a range of biological systems, but obscure important aspects of cell behavior: the shapes, relative locations, movement, and interactions of cells in space. Spatial technologies that collect genomic or epigenomic data while preserving spatial information have begun to overcome these limitations. These new data promise a deeper understanding of the factors that affect cellular behavior, and in particular the ability to directly test existing theories about cell state and variation in the context of morphology, location, motility, and signaling that could not be tested before. Rapid advancements in resolution, ease-of-use, and scale of spatial genomics technologies to address these questions also require an updated toolkit of statistical methods with which to interrogate these data. We present a framework to respond to this new avenue of research: four open biological questions that can now be answered using spatial genomics data paired with methods for analysis. We outline spatial data modalities for each open question that may yield specific insights, discuss how conflicting theories may be tested by comparing the data to conceptual models of biological behavior, and highlight statistical and machine learning-based tools that may prove particularly helpful to recover biological understanding.


Asunto(s)
Genómica , Genómica/métodos , Humanos , Aprendizaje Automático
4.
Proc Natl Acad Sci U S A ; 118(32)2021 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-34362843

RESUMEN

Multicellular organisms rely on spatial signaling among cells to drive their organization, development, and response to stimuli. Several models have been proposed to capture the behavior of spatial signaling in multicellular systems, but existing approaches fail to capture both the autonomous behavior of single cells and the interactions of a cell with its neighbors simultaneously. We propose a spatiotemporal model of dynamic cell signaling based on Hawkes processes-self-exciting point processes-that model the signaling processes within a cell and spatial couplings between cells. With this cellular point process (CPP), we capture both the single-cell pathway activation rate and the magnitude and duration of signaling between cells relative to their spatial location. Furthermore, our model captures tissues composed of heterogeneous cell types with different bursting rates and signaling behaviors across multiple signaling proteins. We apply our model to epithelial cell systems that exhibit a range of autonomous and spatial signaling behaviors basally and under pharmacological exposure. Our model identifies known drug-induced signaling deficits, characterizes signaling changes across a wound front, and generalizes to multichannel observations.


Asunto(s)
Queratinocitos/metabolismo , Modelos Biológicos , Transducción de Señal , Animales , Dipéptidos/farmacología , Perros , Células Epiteliales , Ácidos Hidroxámicos/farmacología , Queratinocitos/citología , Queratinocitos/efectos de los fármacos , Sistema de Señalización de MAP Quinasas/efectos de los fármacos , Células de Riñón Canino Madin Darby , Ratones Endogámicos , Ratones Transgénicos , Modelos Estadísticos , Inhibidores de Proteínas Quinasas/farmacología , Transducción de Señal/efectos de los fármacos , Análisis Espacio-Temporal
5.
Genome Res ; 30(2): 195-204, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31992614

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.


Asunto(s)
Epistasis Genética/genética , RNA-Seq , Análisis de la Célula Individual/métodos , Programas Informáticos , Análisis por Conglomerados , Perfilación de la Expresión Génica , Humanos , Secuenciación del Exoma
6.
Nature ; 550(7675): 204-213, 2017 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-29022597

RESUMEN

Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.


Asunto(s)
Perfilación de la Expresión Génica , Regulación de la Expresión Génica/genética , Variación Genética , Especificidad de Órganos/genética , Alelos , Cromosomas Humanos/genética , Enfermedad/genética , Femenino , Genoma Humano/genética , Genotipo , Humanos , Masculino , Sitios de Carácter Cuantitativo/genética
7.
Biochem J ; 479(11): 1257-1263, 2022 06 17.
Artículo en Inglés | MEDLINE | ID: mdl-35713413

RESUMEN

Petabytes of increasingly complex and multidimensional live cell and tissue imaging data are generated every year. These videos hold large promise for understanding biology at a deep and fundamental level, as they capture single-cell and multicellular events occurring over time and space. However, the current modalities for analysis and mining of these data are scattered and user-specific, preventing more unified analyses from being performed over different datasets and obscuring possible scientific insights. Here, we propose a unified pipeline for storage, segmentation, analysis, and statistical parametrization of live cell imaging datasets.


Asunto(s)
Conjuntos de Datos como Asunto
8.
BMC Bioinformatics ; 23(1): 529, 2022 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-36482321

RESUMEN

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. RESULTS: We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. CONCLUSION: We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.

9.
PLoS Comput Biol ; 17(1): e1008223, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33513136

RESUMEN

Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.


Asunto(s)
Glucocorticoides/farmacología , Modelos Estadísticos , Transcriptoma/efectos de los fármacos , Células A549 , Algoritmos , Biología Computacional , Humanos , Pulmón/química , Pulmón/metabolismo , Aprendizaje Automático , Programas Informáticos , Transcriptoma/genética
10.
Neuroimage ; 245: 118580, 2021 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-34740792

RESUMEN

A key problem in functional magnetic resonance imaging (fMRI) is to estimate spatial activity patterns from noisy high-dimensional signals. Spatial smoothing provides one approach to regularizing such estimates. However, standard smoothing methods ignore the fact that correlations in neural activity may fall off at different rates in different brain areas, or exhibit discontinuities across anatomical or functional boundaries. Moreover, such methods do not exploit the fact that widely separated brain regions may exhibit strong correlations due to bilateral symmetry or the network organization of brain regions. To capture this non-stationary spatial correlation structure, we introduce the brain kernel, a continuous covariance function for whole-brain activity patterns. We define the brain kernel in terms of a continuous nonlinear mapping from 3D brain coordinates to a latent embedding space, parametrized with a Gaussian process (GP). The brain kernel specifies the prior covariance between voxels as a function of the distance between their locations in embedding space. The GP mapping warps the brain nonlinearly so that highly correlated voxels are close together in latent space, and uncorrelated voxels are far apart. We estimate the brain kernel using resting-state fMRI data, and we develop an exact, scalable inference method based on block coordinate descent to overcome the challenges of high dimensionality (10-100K voxels). Finally, we illustrate the brain kernel's usefulness with applications to brain decoding and factor analysis with multiple task-based fMRI datasets.


Asunto(s)
Mapeo Encefálico/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Neuroimagen/métodos , Humanos , Imagenología Tridimensional
11.
Genome Res ; 28(9): 1272-1284, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-30097539

RESUMEN

Glucocorticoids are potent steroid hormones that regulate immunity and metabolism by activating the transcription factor (TF) activity of glucocorticoid receptor (GR). Previous models have proposed that DNA binding motifs and sites of chromatin accessibility predetermine GR binding and activity. However, there are vast excesses of both features relative to the number of GR binding sites. Thus, these features alone are unlikely to account for the specificity of GR binding and activity. To identify genomic and epigenetic contributions to GR binding specificity and the downstream changes resultant from GR binding, we performed hundreds of genome-wide measurements of TF binding, epigenetic state, and gene expression across a 12-h time course of glucocorticoid exposure. We found that glucocorticoid treatment induces GR to bind to nearly all pre-established enhancers within minutes. However, GR binds to only a small fraction of the set of accessible sites that lack enhancer marks. Once GR is bound to enhancers, a combination of enhancer motif composition and interactions between enhancers then determines the strength and persistence of GR binding, which consequently correlates with dramatic shifts in enhancer activation. Over the course of several hours, highly coordinated changes in TF binding and histone modification occupancy occur specifically within enhancers, and these changes correlate with changes in the expression of nearby genes. Following GR binding, changes in the binding of other TFs precede changes in chromatin accessibility, suggesting that other TFs are also sensitive to genomic features beyond that of accessibility.


Asunto(s)
Elementos de Facilitación Genéticos , Código de Histonas , Motivos de Nucleótidos , Receptores de Glucocorticoides/metabolismo , Activación Transcripcional , Línea Celular Tumoral , Epigénesis Genética , Humanos , Unión Proteica , Factores de Transcripción/metabolismo
12.
BMC Bioinformatics ; 21(1): 324, 2020 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-32693778

RESUMEN

BACKGROUND: Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single-cell data. However, methods have yet to be developed for unfiltered and unnormalized count data that estimate uncertainty in the low-dimensional space. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. RESULTS: Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student's t-distribution to estimate a manifold that is robust to technical and biological noise found in normalized scRNA-seq data. We compare our approach to common dimension reduction tools across a diverse set of scRNA-seq data sets to highlight our model's ability to enable important downstream tasks such as clustering, inferring cell developmental trajectories, and visualizing high throughput experiments on available experimental data. CONCLUSION: We show that our adaptive robust statistical approach to estimate a nonlinear manifold is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.


Asunto(s)
Dinámicas no Lineales , RNA-Seq , Análisis de la Célula Individual/métodos , Células Sanguíneas/metabolismo , Regulación de la Expresión Génica , Humanos , Modelos Genéticos , Neuronas/metabolismo , Distribución Normal , Análisis de Componente Principal , Factores de Tiempo
13.
Genome Res ; 27(2): 320-333, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27864351

RESUMEN

Microbial growth curves are used to study differential effects of media, genetics, and stress on microbial population growth. Consequently, many modeling frameworks exist to capture microbial population growth measurements. However, current models are designed to quantify growth under conditions for which growth has a specific functional form. Extensions to these models are required to quantify the effects of perturbations, which often exhibit nonstandard growth curves. Rather than assume specific functional forms for experimental perturbations, we developed a general and robust model of microbial population growth curves using Gaussian process (GP) regression. GP regression modeling of high-resolution time-series growth data enables accurate quantification of population growth and allows explicit control of effects from other covariates such as genetic background. This framework substantially outperforms commonly used microbial population growth models, particularly when modeling growth data from environmentally stressed populations. We apply the GP growth model and develop statistical tests to quantify the differential effects of environmental perturbations on microbial growth across a large compendium of genotypes in archaea and yeast. This method accurately identifies known transcriptional regulators and implicates novel regulators of growth under standard and stress conditions in the model archaeal organism Halobacterium salinarum For yeast, our method correctly identifies known phenotypes for a diversity of genetic backgrounds under cyclohexamide stress and also detects previously unidentified oxidative stress sensitivity across a subset of strains. Together, these results demonstrate that the GP models are interpretable, recapitulating biological knowledge of growth response while providing new insights into the relevant parameters affecting microbial population growth.


Asunto(s)
Halobacterium salinarum/crecimiento & desarrollo , Modelos Biológicos , Levaduras/crecimiento & desarrollo , Halobacterium salinarum/genética , Distribución Normal , Fenotipo , Levaduras/genética
14.
Genome Res ; 27(11): 1843-1858, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-29021288

RESUMEN

Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Empalme del ARN , Análisis de Secuencia de ARN/métodos , Teorema de Bayes , Bases de Datos Genéticas , Regulación de la Expresión Génica , Técnicas de Genotipaje , Humanos , Especificidad de Órganos , Polimorfismo de Nucleótido Simple
15.
Bioinformatics ; 35(2): 200-210, 2019 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-29982387

RESUMEN

Motivation: Identifying variants, both discrete and continuous, that are associated with quantitative traits, or QTs, is the primary focus of quantitative genetics. Most current methods are limited to identifying mean effects, or associations between genotype or covariates and the mean value of a quantitative trait. It is possible, however, that a variant may affect the variance of the quantitative trait in lieu of, or in addition to, affecting the trait mean. Here, we develop a general methodology to identify covariates with variance effects on a quantitative trait using a Bayesian heteroskedastic linear regression model (BTH). We compare BTH with existing methods to detect variance effects across a large range of simulations drawn from scenarios common to the analysis of quantitative traits. Results: We find that BTH and a double generalized linear model (dglm) outperform classical tests used for detecting variance effects in recent genomic studies. We show BTH and dglm are less likely to generate spurious discoveries through simulations and application to identifying methylation variance QTs and expression variance QTs. We identify four variance effects of sex in the Cardiovascular and Pharmacogenetics study. Our work is the first to offer a comprehensive view of variance identifying methodology. We identify shortcomings in previously used methodology and provide a more conservative and robust alternative. We extend variance effect analysis to a wide array of covariates that enables a new statistical dimension in the study of sex and age specific quantitative trait effects. Availability and implementation: https://github.com/b2du/bth. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Teorema de Bayes , Genómica/métodos , Modelos Lineales , Modelos Genéticos , Sitios de Carácter Cuantitativo , Análisis de Varianza , Biología Computacional , Humanos , Fenotipo
16.
BMC Med Inform Decis Mak ; 20(1): 152, 2020 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-32641134

RESUMEN

BACKGROUND: For real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring. METHODS: We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates. RESULTS: We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals. CONCLUSIONS: The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://github.com/bee-hive/MedGP .


Asunto(s)
Algoritmos , Modelos Estadísticos , Teorema de Bayes , Distribución Normal
17.
PLoS Comput Biol ; 14(1): e1005896, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-29337990

RESUMEN

Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.


Asunto(s)
Análisis por Conglomerados , Regulación Neoplásica de la Expresión Génica , Neoplasias Pulmonares/genética , Células A549 , Algoritmos , Línea Celular Tumoral , Biología Computacional , Simulación por Computador , Dexametasona/química , Perfilación de la Expresión Génica , Glucocorticoides/química , Histonas/química , Humanos , Enlace de Hidrógeno , Peróxido de Hidrógeno/química , Neoplasias Pulmonares/tratamiento farmacológico , Modelos Biológicos , Distribución Normal , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ARN , Factores de Tiempo , Factores de Transcripción/química
18.
Nature ; 502(7471): 377-80, 2013 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-23995691

RESUMEN

Statins are prescribed widely to lower plasma low-density lipoprotein (LDL) concentrations and cardiovascular disease risk and have been shown to have beneficial effects in a broad range of patients. However, statins are associated with an increased risk, albeit small, of clinical myopathy and type 2 diabetes. Despite evidence for substantial genetic influence on LDL concentrations, pharmacogenomic trials have failed to identify genetic variations with large effects on either statin efficacy or toxicity, and have produced little information regarding mechanisms that modulate statin response. Here we identify a downstream target of statin treatment by screening for the effects of in vitro statin exposure on genetic associations with gene expression levels in lymphoblastoid cell lines derived from 480 participants of a clinical trial of simvastatin treatment. This analysis identified six expression quantitative trait loci (eQTLs) that interacted with simvastatin exposure, including rs9806699, a cis-eQTL for the gene glycine amidinotransferase (GATM) that encodes the rate-limiting enzyme in creatine synthesis. We found this locus to be associated with incidence of statin-induced myotoxicity in two separate populations (meta-analysis odds ratio = 0.60). Furthermore, we found that GATM knockdown in hepatocyte-derived cell lines attenuated transcriptional response to sterol depletion, demonstrating that GATM may act as a functional link between statin-mediated lowering of cholesterol and susceptibility to statin-induced myopathy.


Asunto(s)
Amidinotransferasas/genética , Regulación de la Expresión Génica/efectos de los fármacos , Inhibidores de Hidroximetilglutaril-CoA Reductasas/efectos adversos , Enfermedades Musculares/inducido químicamente , Sitios de Carácter Cuantitativo/genética , Simvastatina/efectos adversos , Amidinotransferasas/deficiencia , Amidinotransferasas/metabolismo , Línea Celular , Colesterol/deficiencia , Colesterol/metabolismo , Colesterol/farmacología , Técnicas de Silenciamiento del Gen , Humanos , Inhibidores de Hidroximetilglutaril-CoA Reductasas/farmacología , Linfocitos/citología , Linfocitos/efectos de los fármacos , Linfocitos/metabolismo , Enfermedades Musculares/genética , Enfermedades Musculares/metabolismo , Polimorfismo de Nucleótido Simple/genética , Simvastatina/farmacología , Proteínas de Unión a los Elementos Reguladores de Esteroles/metabolismo , Transcripción Genética/efectos de los fármacos
19.
Proc Natl Acad Sci U S A ; 112(26): E3441-50, 2015 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-26071445

RESUMEN

Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of fit of a statistical model to a specific dataset. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the admixture model fit to four qualitatively different population genetic datasets: the population reference sample (POPRES) European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies.


Asunto(s)
Modelos Teóricos , Teorema de Bayes , Variación Genética , Humanos , Desequilibrio de Ligamiento , Incertidumbre
20.
PLoS Comput Biol ; 12(7): e1004791, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27467526

RESUMEN

Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx) pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Teorema de Bayes , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Análisis por Conglomerados , Femenino , Humanos , Masculino , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA