Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Stat Med ; 43(14): 2713-2733, 2024 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-38690642

RESUMEN

This article presents a novel method for learning time-varying dynamic Bayesian networks. The proposed method breaks down the dynamic Bayesian network learning problem into a sequence of regression inference problems and tackles each problem using the Markov neighborhood regression technique. Notably, the method demonstrates scalability concerning data dimensionality, accommodates time-varying network structure, and naturally handles multi-subject data. The proposed method exhibits consistency and offers superior performance compared to existing methods in terms of estimation accuracy and computational efficiency, as supported by extensive numerical experiments. To showcase its effectiveness, we apply the proposed method to an fMRI study investigating the effective connectivity among various regions of interest (ROIs) during an emotion-processing task. Our findings reveal the pivotal role of the subcortical-cerebellum in emotion processing.


Asunto(s)
Teorema de Bayes , Emociones , Imagen por Resonancia Magnética , Humanos , Imagen por Resonancia Magnética/métodos , Emociones/fisiología , Cadenas de Markov , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Simulación por Computador
2.
Biostatistics ; 22(2): 233-249, 2021 04 10.
Artículo en Inglés | MEDLINE | ID: mdl-33838043

RESUMEN

Motivated by the study of the molecular mechanism underlying type 1 diabetes with gene expression data collected from both patients and healthy controls at multiple time points, we propose a hybrid Bayesian method for jointly estimating multiple dependent Gaussian graphical models with data observed under distinct conditions, which avoids inversion of high-dimensional covariance matrices and thus can be executed very fast. We prove the consistency of the proposed method under mild conditions. The numerical results indicate the superiority of the proposed method over existing ones in both estimation accuracy and computational efficiency. Extension of the proposed method to joint estimation of multiple mixed graphical models is straightforward.


Asunto(s)
Diabetes Mellitus Tipo 1 , Redes Reguladoras de Genes , Teorema de Bayes , Diabetes Mellitus Tipo 1/genética , Humanos , Modelos Estadísticos , Distribución Normal
3.
Stat Med ; 41(20): 4057-4078, 2022 09 10.
Artículo en Inglés | MEDLINE | ID: mdl-35688606

RESUMEN

High-dimensional inference is one of fundamental problems in modern biomedical studies. However, the existing methods do not perform satisfactorily. Based on the Markov property of graphical models and the likelihood ratio test, this article provides a simple justification for the Markov neighborhood regression method such that it can be applied to statistical inference for high-dimensional generalized linear models with mixed features. The Markov neighborhood regression method is highly attractive in that it breaks the high-dimensional inference problems into a series of low-dimensional inference problems. The proposed method is applied to the cancer cell line encyclopedia data for identification of the genes and mutations that are sensitive to the response of anti-cancer drugs. The numerical results favor the Markov neighborhood regression method to the existing ones.


Asunto(s)
Modelos Estadísticos , Humanos , Funciones de Verosimilitud , Modelos Lineales , Cadenas de Markov , Análisis de Regresión
4.
J Stat Comput Simul ; 92(2): 318-336, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35559269

RESUMEN

We propose a class of adaptive stochastic gradient Markov chain Monte Carlo (SGMCMC) algorithms, where the drift function is adaptively adjusted according to the gradient of past samples to accelerate the convergence of the algorithm in simulations of the distributions with pathological curvatures. We establish the convergence of the proposed algorithms under mild conditions. The numerical examples indicate that the proposed algorithms can significantly outperform the popular SGMCMC algorithms, such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian Monte Carlo (SGHMC) and preconditioned SGLD, in both simulation and optimization tasks. In particular, the proposed algorithms can converge quickly for the distributions for which the energy landscape possesses pathological curvatures.

5.
Stat Probab Lett ; 1802022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34744226

RESUMEN

Deep learning has achieved great successes in many machine learning tasks. However, the deep neural networks (DNNs) are often severely over-parameterized, making them computationally expensive, memory intensive, less interpretable and mis-calibrated. We study sparse DNNs under the Bayesian framework: we establish posterior consistency and structure selection consistency for Bayesian DNNs with a spike-and-slab prior, and illustrate their performance using examples on high-dimensional nonlinear variable selection, large network compression and model calibration. Our numerical results indicate that sparsity is essential for improving the prediction accuracy and calibration of the DNN.

6.
Biostatistics ; 20(4): 565-581, 2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29788035

RESUMEN

Digital pathology imaging of tumor tissues, which captures histological details in high resolution, is fast becoming a routine clinical procedure. Recent developments in deep-learning methods have enabled the identification, characterization, and classification of individual cells from pathology images analysis at a large scale. This creates new opportunities to study the spatial patterns of and interactions among different types of cells. Reliable statistical approaches to modeling such spatial patterns and interactions can provide insight into tumor progression and shed light on the biological mechanisms of cancer. In this article, we consider the problem of modeling a pathology image with irregular locations of three different types of cells: lymphocyte, stromal, and tumor cells. We propose a novel Bayesian hierarchical model, which incorporates a hidden Potts model to project the irregularly distributed cells to a square lattice and a Markov random field prior model to identify regions in a heterogeneous pathology image. The model allows us to quantify the interactions between different types of cells, some of which are clinically meaningful. We use Markov chain Monte Carlo sampling techniques, combined with a double Metropolis-Hastings algorithm, in order to simulate samples approximately from a distribution with an intractable normalizing constant. The proposed model was applied to the pathology images of $205$ lung cancer patients from the National Lung Screening trial, and the results show that the interaction strength between tumor and stromal cells predicts patient prognosis (P = $0.005$). This statistical methodology provides a new perspective for understanding the role of cell-cell interactions in cancer progression.


Asunto(s)
Algoritmos , Interpretación de Imagen Asistida por Computador , Neoplasias Pulmonares/diagnóstico por imagen , Neoplasias Pulmonares/patología , Modelos Estadísticos , Teorema de Bayes , Humanos , Cadenas de Markov , Método de Montecarlo
7.
Biostatistics ; 19(2): 216-232, 2018 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-29036516

RESUMEN

Gaussian graphical models have been widely used to construct gene regulatory networks from gene expression data. Most existing methods for Gaussian graphical models are designed to model homogeneous data, assuming a single Gaussian distribution. In practice, however, data may consist of gene expression studies with unknown confounding factors, such as study cohort, microarray platforms, experimental batches, which produce heterogeneous data, and hence lead to false positive edges or low detection power in resulting network, due to those unknown factors. To overcome this problem and improve the performance in constructing gene networks, we propose a two-stage approach to construct a gene network from heterogeneous data. The first stage is to perform a clustering analysis in order to assign samples to a few clusters where the samples in each cluster are approximately homogeneous, and the second stage is to conduct an integrative analysis of networks from each cluster. In particular, we first apply a model-based clustering method using the singular value decomposition for high-dimensional data, and then integrate the networks from each cluster using the integrative $\psi$-learning method. The proposed method is based on an equivalent measure of partial correlation coefficients in Gaussian graphical models, which is computed with a reduced conditional set and thus it is useful for high-dimensional data. We compare the proposed two-stage learning approach with some existing methods in various simulation settings, and demonstrate the robustness of the proposed method. Finally, it is applied to integrate multiple gene expression studies of lung adenocarcinoma to identify potential therapeutic targets and treatment biomarkers.


Asunto(s)
Bioestadística/métodos , Interpretación Estadística de Datos , Expresión Génica , Redes Reguladoras de Genes , Genómica/métodos , Modelos Estadísticos , Análisis por Conglomerados , Humanos , Neoplasias Pulmonares/genética
8.
Neural Comput ; 31(6): 1183-1214, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30979349

RESUMEN

Bayesian networks have been widely used in many scientific fields for describing the conditional independence relationships for a large set of random variables. This letter proposes a novel algorithm, the so-called p-learning algorithm, for learning moral graphs for high-dimensional Bayesian networks. The moral graph is a Markov network representation of the Bayesian network and also the key to construction of the Bayesian network for constraint-based algorithms. The consistency of the p-learning algorithm is justified under the small-n, large-p scenario. The numerical results indicate that the p-learning algorithm significantly outperforms the existing ones, such as the PC, grow-shrink, incremental association, semi-interleaved hiton, hill-climbing, and max-min hill-climbing. Under the sparsity assumption, the p-learning algorithm has a computational complexity of O(p2) even in the worst case, while the existing algorithms have a computational complexity of O(p3) in the worst case.


Asunto(s)
Algoritmos , Teorema de Bayes , Redes Neurales de la Computación , Humanos
9.
BMC Bioinformatics ; 18(1): 186, 2017 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-28335719

RESUMEN

BACKGROUND: Gene regulatory networks reveal how genes work together to carry out their biological functions. Reconstructions of gene networks from gene expression data greatly facilitate our understanding of underlying biological mechanisms and provide new opportunities for biomarker and drug discoveries. In gene networks, a gene that has many interactions with other genes is called a hub gene, which usually plays an essential role in gene regulation and biological processes. In this study, we developed a method for reconstructing gene networks using a partial correlation-based approach that incorporates prior information about hub genes. Through simulation studies and two real-data examples, we compare the performance in estimating the network structures between the existing methods and the proposed method. RESULTS: In simulation studies, we show that the proposed strategy reduces errors in estimating network structures compared to the existing methods. When applied to Escherichia coli, the regulation network constructed by our proposed ESPACE method is more consistent with current biological knowledge than the SPACE method. Furthermore, application of the proposed method in lung cancer has identified hub genes whose mRNA expression predicts cancer progress and patient response to treatment. CONCLUSIONS: We have demonstrated that incorporating hub gene information in estimating network structures can improve the performance of the existing methods.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes/genética , Humanos
10.
Biometrics ; 73(4): 1221-1230, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-28294287

RESUMEN

In recent years, next generation sequencing (NGS) has gradually replaced microarray as the major platform in measuring gene expressions. Compared to microarray, NGS has many advantages, such as less noise and higher throughput. However, the discreteness of NGS data also challenges the existing statistical methodology. In particular, there still lacks an appropriate statistical method for reconstructing gene regulatory networks using NGS data in the literature. The existing local Poisson graphical model method is not consistent and can only infer certain local structures of the network. In this article, we propose a random effect model-based transformation to continuize NGS data and then we transform the continuized data to Gaussian via a semiparametric transformation and apply an equivalent partial correlation selection method to reconstruct gene regulatory networks. The proposed method is consistent. The numerical results indicate that the proposed method can lead to much more accurate inference of gene regulatory networks than the local Poisson graphical model and other existing methods. The proposed data-continuized transformation fills the theoretical gap for how to transform discrete data to continuous data and facilitates NGS data analysis. The proposed data-continuized transformation also makes it feasible to integrate different types of data, such as microarray and RNA-seq data, in reconstruction of gene regulatory networks.


Asunto(s)
Redes Reguladoras de Genes/genética , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Estadística como Asunto
11.
PLoS Genet ; 8(1): e1002482, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22291610

RESUMEN

An important follow-up step after genetic markers are found to be associated with a disease outcome is a more detailed analysis investigating how the implicated gene or chromosomal region and an established environment risk factor interact to influence the disease risk. The standard approach to this study of gene-environment interaction considers one genetic marker at a time and therefore could misrepresent and underestimate the genetic contribution to the joint effect when one or more functional loci, some of which might not be genotyped, exist in the region and interact with the environment risk factor in a complex way. We develop a more global approach based on a Bayesian model that uses a latent genetic profile variable to capture all of the genetic variation in the entire targeted region and allows the environment effect to vary across different genetic profile categories. We also propose a resampling-based test derived from the developed Bayesian model for the detection of gene-environment interaction. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the Bayesian model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region, which contains a cluster of nicotinic acetylcholine receptor genes and has been shown to be associated with both lung cancer and smoking behavior. We find evidence for gene-environment interaction (P-value = 0.016), with the smoking effect appearing to be stronger in subjects with a genetic profile associated with a higher lung cancer risk; the conventional test of gene-environment interaction based on the single-marker approach is far from significant.


Asunto(s)
Teorema de Bayes , Cromosomas Humanos Par 15/genética , Enfermedad/genética , Interacción Gen-Ambiente , Neoplasias Pulmonares/genética , Receptores Nicotínicos/genética , Algoritmos , Simulación por Computador , Estudios de Asociación Genética , Marcadores Genéticos , Genotipo , Humanos , Cadenas de Markov , Modelos Teóricos , Método de Montecarlo , Polimorfismo de Nucleótido Simple , Factores de Riesgo , Fumar
12.
Neural Netw ; 179: 106512, 2024 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-39032394

RESUMEN

Network embedding is a general-purpose machine learning technique that converts network data from non-Euclidean space to Euclidean space, facilitating downstream analyses for the networks. However, existing embedding methods are often optimization-based, with the embedding dimension determined in a heuristic or ad hoc way, which can cause potential bias in downstream statistical inference. Additionally, existing deep embedding methods can suffer from a nonidentifiability issue due to the universal approximation power of deep neural networks. We address these issues within a rigorous statistical framework. We treat the embedding vectors as missing data, reconstruct the network features using a sparse decoder, and simultaneously impute the embedding vectors and train the sparse decoder using an adaptive stochastic gradient Markov chain Monte Carlo (MCMC) algorithm. Under mild conditions, we show that the sparse decoder provides a parsimonious mapping from the embedding space to network features, enabling effective selection of the embedding dimension and overcoming the nonidentifiability issue encountered by existing deep embedding methods. Furthermore, we show that the embedding vectors converge weakly to a desired posterior distribution in the 2-Wasserstein distance, addressing the potential bias issue experienced by existing embedding methods. This work lays down the first theoretical foundation for network embedding within the framework of missing data imputation.

13.
Neural Comput ; 25(8): 2199-234, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23607562

RESUMEN

Simulating from distributions with intractable normalizing constants has been a long-standing problem in machine learning. In this letter, we propose a new algorithm, the Monte Carlo Metropolis-Hastings (MCMH) algorithm, for tackling this problem. The MCMH algorithm is a Monte Carlo version of the Metropolis-Hastings algorithm. It replaces the unknown normalizing constant ratio by a Monte Carlo estimate in simulations, while still converges, as shown in the letter, to the desired target distribution under mild conditions. The MCMH algorithm is illustrated with spatial autologistic models and exponential random graph models. Unlike other auxiliary variable Markov chain Monte Carlo (MCMC) algorithms, such as the Møller and exchange algorithms, the MCMH algorithm avoids the requirement for perfect sampling, and thus can be applied to many statistical models for which perfect sampling is not available or very expensive. The MCMH algorithm can also be applied to Bayesian inference for random effect models and missing data problems that involve simulations from a distribution with intractable integrals.


Asunto(s)
Algoritmos , Inteligencia Artificial , Procesamiento Automatizado de Datos , Modelos Teóricos , Método de Montecarlo , Teorema de Bayes , Simulación por Computador , Humanos , Neoplasias/mortalidad , Apoyo Social
14.
J Comput Graph Stat ; 32(2): 448-469, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38240013

RESUMEN

Inference for high-dimensional, large scale and long series dynamic systems is a challenging task in modern data science. The existing algorithms, such as particle filter or sequential importance sampler, do not scale well to the dimension of the system and the sample size of the dataset, and often suffers from the sample degeneracy issue for long series data. The recently proposed Langevinized ensemble Kalman filter (LEnKF) addresses these difficulties in a coherent way. However, it cannot be applied to the case that the dynamic system contains unknown parameters. This article proposes the so-called stochastic approximation-LEnKF for jointly estimating the states and unknown parameters of the dynamic system, where the parameters are estimated on the fly based on the state variables simulated by the LEnKF under the framework of stochastic approximation Markov chain Monte Carlo (MCMC). Under mild conditions, we prove its consistency in parameter estimation and ergodicity in state variable simulations. The proposed algorithm can be used in uncertainty quantification for long series, large scale, and high-dimensional dynamic systems. Numerical results indicate its superiority over the existing algorithms. We employ the proposed algorithm in state-space modeling of the sea surface temperature with a long short term memory (LSTM) network, which indicates its great potential in statistical analysis of complex dynamic systems encountered in modern data science. Supplementary materials for this article are available online.

15.
J Appl Stat ; 50(11-12): 2624-2647, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37529571

RESUMEN

This paper proposes a dynamic infectious disease model for COVID-19 daily counts data and estimate the model using the Langevinized EnKF algorithm, which is scalable for large-scale spatio-temporal data, converges to the right filtering distribution, and is thus suitable for performing statistical inference and quantifying uncertainty for the underlying dynamic system. Under the framework of the proposed dynamic infectious disease model, we tested the impact of temperature, precipitation, state emergency order and stay home order on the spread of COVID-19 based on the United States county-wise daily counts data. Our numerical results show that warm and humid weather can significantly slow the spread of COVID-19, and the state emergency and stay home orders also help to slow it. This finding provides guidance and support to future policies or acts for mitigating the community transmission and lowering the mortality rate of COVID-19.

16.
Biostatistics ; 12(3): 582-93, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21209154

RESUMEN

The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100-500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10( - 6)). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer.


Asunto(s)
Algoritmos , Interpretación Estadística de Datos , Método de Montecarlo , Procesos Estocásticos , Simulación por Computador , Estudio de Asociación del Genoma Completo/métodos , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/genética
17.
J Am Stat Assoc ; 117(540): 1981-1995, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36945326

RESUMEN

Deep learning has been the engine powering many successes of data science. However, the deep neural network (DNN), as the basic model of deep learning, is often excessively over-parameterized, causing many difficulties in training, prediction and interpretation. We propose a frequentist-like method for learning sparse DNNs and justify its consistency under the Bayesian framework: the proposed method could learn a sparse DNN with at most O(n/log(n)) connections and nice theoretical guarantees such as posterior consistency, variable selection consistency and asymptotically optimal generalization bounds. In particular, we establish posterior consistency for the sparse DNN with a mixture Gaussian prior, show that the structure of the sparse DNN can be consistently determined using a Laplace approximation-based marginal posterior inclusion probability approach, and use Bayesian evidence to elicit sparse DNNs learned by an optimization method such as stochastic gradient descent in multiple runs with different initializations. The proposed method is computationally more efficient than standard Bayesian methods for large-scale sparse DNNs. The numerical results indicate that the proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection, both advancing interpretable machine learning.

18.
Bioinformatics ; 26(6): 777-83, 2010 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-20110277

RESUMEN

MOTIVATION: Chromatin immunoprecipitation (ChIP) coupled with tiling microarray (chip) experiments have been used in a wide range of biological studies such as identification of transcription factor binding sites and investigation of DNA methylation and histone modification. Hidden Markov models are widely used to model the spatial dependency of ChIP-chip data. However, parameter estimation for these models is typically either heuristic or suboptimal, leading to inconsistencies in their applications. To overcome this limitation and to develop an efficient software, we propose a hidden ferromagnetic Ising model for ChIP-chip data analysis. RESULTS: We have developed a simple, but powerful Bayesian hierarchical model for ChIP-chip data via a hidden Ising model. Metropolis within Gibbs sampling algorithm is used to simulate from the posterior distribution of the model parameters. The proposed model naturally incorporates the spatial dependency of the data, and can be used to analyze data with various genomic resolutions and sample sizes. We illustrate the method using three publicly available datasets and various simulated datasets, and compare it with three closely related methods, namely TileMap HMM, tileHMM and BAC. We find that our method performs as well as TileMap HMM and BAC for the high-resolution data from Affymetrix platform, but significantly outperforms the other three methods for the low-resolution data from Agilent platform. Compared with the BAC method which also involves MCMC simulations, our method is computationally much more efficient. AVAILABILITY: A software called iChip is freely available at http://www.bioconductor.org/. CONTACT: moq@mskcc.org.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Genómica/métodos
19.
BMC Med Genet ; 12: 48, 2011 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-21457555

RESUMEN

BACKGROUND: BCL-2 (B-cell leukemia/lymphoma 2) gene has been demonstrated to be associated with breast cancer development and a single nucleotide polymorphism (SNP; -938C > A) has been identified recently. To investigate whether this polymorphism functions as a modifier of breast cancer development, we analyzed the distribution of genotype frequency, as well as the association of genotype with clinicopathological characteristics. Furthermore, we also studied the effects of this SNP on Bcl-2 expression in vitro. METHODS: We genotyped the BCL-2 (-938C > A) in 114 patients and 107 controls, and analyzed the estrogen receptor (ER), progestogen receptor (PR), C-erbB2 and Ki67 status with immunohistochemistry (IHC). Different Bcl-2 protein levels in breast cancer cell lines were determined using western blot. Logistic regression model was applied in statistical analysis. RESULTS: We found that homozygous AA genotype was associated with an increased risk (AA vs AC+CC) by 2.37-fold for breast cancer development and significant association was observed between nodal status and different genotypes of BCL-2 (-938C > A) (p = 0.014). AA genotype was more likely to develop into lobular breast cancer (p = 0.036). The result of western blot analysis indicated that allele A was associated with the lower level of Bcl-2 expression in breast cancer cell lines. CONCLUSIONS: AA genotype of BCL-2 (-938C > A) is associated with susceptibility of breast cancer, and this genotype is only associated with the nodal status and pathological diagnosis of breast cancer. The polymorphism has an effect on Bcl-2 expression but needs further investigation.


Asunto(s)
Biomarcadores de Tumor/análisis , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Genes bcl-2 , Polimorfismo de Nucleótido Simple , Proteínas Proto-Oncogénicas c-bcl-2/genética , Adulto , Anciano , Alanina , Western Blotting , Neoplasias de la Mama/química , Estudios de Casos y Controles , Línea Celular Tumoral , Cisteína , Femenino , Regulación Neoplásica de la Expresión Génica , Predisposición Genética a la Enfermedad , Genotipo , Humanos , Inmunohistoquímica , Modelos Logísticos , Metástasis Linfática/genética , Persona de Mediana Edad , Reacción en Cadena de la Polimerasa , Polimorfismo de Longitud del Fragmento de Restricción , Medición de Riesgo , Factores de Riesgo
20.
Biometrics ; 66(4): 1284-94, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20128774

RESUMEN

ChIP-chip experiments are procedures that combine chromatin immunoprecipitation (ChIP) and DNA microarray (chip) technology to study a variety of biological problems, including protein-DNA interaction, histone modification, and DNA methylation. The most important feature of ChIP-chip data is that the intensity measurements of probes are spatially correlated because the DNA fragments are hybridized to neighboring probes in the experiments. We propose a simple, but powerful Bayesian hierarchical approach to ChIP-chip data through an Ising model with high-order interactions. The proposed method naturally takes into account the intrinsic spatial structure of the data and can be used to analyze data from multiple platforms with different genomic resolutions. The model parameters are estimated using the Gibbs sampler. The proposed method is illustrated using two publicly available data sets from Affymetrix and Agilent platforms, and compared with three alternative Bayesian methods, namely, Bayesian hierarchical model, hierarchical gamma mixture model, and Tilemap hidden Markov model. The numerical results indicate that the proposed method performs as well as the other three methods for the data from Affymetrix tiling arrays, but significantly outperforms the other three methods for the data from Agilent promoter arrays. In addition, we find that the proposed method has better operating characteristics in terms of sensitivities and false discovery rates under various scenarios.


Asunto(s)
Teorema de Bayes , Inmunoprecipitación de Cromatina/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Humanos , Métodos , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA