Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
1.
Stat Med ; 43(14): 2713-2733, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38690642

RESUMO

This article presents a novel method for learning time-varying dynamic Bayesian networks. The proposed method breaks down the dynamic Bayesian network learning problem into a sequence of regression inference problems and tackles each problem using the Markov neighborhood regression technique. Notably, the method demonstrates scalability concerning data dimensionality, accommodates time-varying network structure, and naturally handles multi-subject data. The proposed method exhibits consistency and offers superior performance compared to existing methods in terms of estimation accuracy and computational efficiency, as supported by extensive numerical experiments. To showcase its effectiveness, we apply the proposed method to an fMRI study investigating the effective connectivity among various regions of interest (ROIs) during an emotion-processing task. Our findings reveal the pivotal role of the subcortical-cerebellum in emotion processing.


Assuntos
Teorema de Bayes , Emoções , Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Emoções/fisiologia , Cadeias de Markov , Encéfalo/diagnóstico por imagem , Encéfalo/fisiologia , Simulação por Computador
2.
Biostatistics ; 22(2): 233-249, 2021 04 10.
Artigo em Inglês | MEDLINE | ID: mdl-33838043

RESUMO

Motivated by the study of the molecular mechanism underlying type 1 diabetes with gene expression data collected from both patients and healthy controls at multiple time points, we propose a hybrid Bayesian method for jointly estimating multiple dependent Gaussian graphical models with data observed under distinct conditions, which avoids inversion of high-dimensional covariance matrices and thus can be executed very fast. We prove the consistency of the proposed method under mild conditions. The numerical results indicate the superiority of the proposed method over existing ones in both estimation accuracy and computational efficiency. Extension of the proposed method to joint estimation of multiple mixed graphical models is straightforward.


Assuntos
Diabetes Mellitus Tipo 1 , Redes Reguladoras de Genes , Teorema de Bayes , Diabetes Mellitus Tipo 1/genética , Humanos , Modelos Estatísticos , Distribuição Normal
3.
Stat Med ; 41(20): 4057-4078, 2022 09 10.
Artigo em Inglês | MEDLINE | ID: mdl-35688606

RESUMO

High-dimensional inference is one of fundamental problems in modern biomedical studies. However, the existing methods do not perform satisfactorily. Based on the Markov property of graphical models and the likelihood ratio test, this article provides a simple justification for the Markov neighborhood regression method such that it can be applied to statistical inference for high-dimensional generalized linear models with mixed features. The Markov neighborhood regression method is highly attractive in that it breaks the high-dimensional inference problems into a series of low-dimensional inference problems. The proposed method is applied to the cancer cell line encyclopedia data for identification of the genes and mutations that are sensitive to the response of anti-cancer drugs. The numerical results favor the Markov neighborhood regression method to the existing ones.


Assuntos
Modelos Estatísticos , Humanos , Funções Verossimilhança , Modelos Lineares , Cadeias de Markov , Análise de Regressão
4.
J Stat Comput Simul ; 92(2): 318-336, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35559269

RESUMO

We propose a class of adaptive stochastic gradient Markov chain Monte Carlo (SGMCMC) algorithms, where the drift function is adaptively adjusted according to the gradient of past samples to accelerate the convergence of the algorithm in simulations of the distributions with pathological curvatures. We establish the convergence of the proposed algorithms under mild conditions. The numerical examples indicate that the proposed algorithms can significantly outperform the popular SGMCMC algorithms, such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian Monte Carlo (SGHMC) and preconditioned SGLD, in both simulation and optimization tasks. In particular, the proposed algorithms can converge quickly for the distributions for which the energy landscape possesses pathological curvatures.

5.
Stat Probab Lett ; 1802022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34744226

RESUMO

Deep learning has achieved great successes in many machine learning tasks. However, the deep neural networks (DNNs) are often severely over-parameterized, making them computationally expensive, memory intensive, less interpretable and mis-calibrated. We study sparse DNNs under the Bayesian framework: we establish posterior consistency and structure selection consistency for Bayesian DNNs with a spike-and-slab prior, and illustrate their performance using examples on high-dimensional nonlinear variable selection, large network compression and model calibration. Our numerical results indicate that sparsity is essential for improving the prediction accuracy and calibration of the DNN.

6.
Biostatistics ; 20(4): 565-581, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-29788035

RESUMO

Digital pathology imaging of tumor tissues, which captures histological details in high resolution, is fast becoming a routine clinical procedure. Recent developments in deep-learning methods have enabled the identification, characterization, and classification of individual cells from pathology images analysis at a large scale. This creates new opportunities to study the spatial patterns of and interactions among different types of cells. Reliable statistical approaches to modeling such spatial patterns and interactions can provide insight into tumor progression and shed light on the biological mechanisms of cancer. In this article, we consider the problem of modeling a pathology image with irregular locations of three different types of cells: lymphocyte, stromal, and tumor cells. We propose a novel Bayesian hierarchical model, which incorporates a hidden Potts model to project the irregularly distributed cells to a square lattice and a Markov random field prior model to identify regions in a heterogeneous pathology image. The model allows us to quantify the interactions between different types of cells, some of which are clinically meaningful. We use Markov chain Monte Carlo sampling techniques, combined with a double Metropolis-Hastings algorithm, in order to simulate samples approximately from a distribution with an intractable normalizing constant. The proposed model was applied to the pathology images of $205$ lung cancer patients from the National Lung Screening trial, and the results show that the interaction strength between tumor and stromal cells predicts patient prognosis (P = $0.005$). This statistical methodology provides a new perspective for understanding the role of cell-cell interactions in cancer progression.


Assuntos
Algoritmos , Interpretação de Imagem Assistida por Computador , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/patologia , Modelos Estatísticos , Teorema de Bayes , Humanos , Cadeias de Markov , Método de Monte Carlo
7.
Biostatistics ; 19(2): 216-232, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29036516

RESUMO

Gaussian graphical models have been widely used to construct gene regulatory networks from gene expression data. Most existing methods for Gaussian graphical models are designed to model homogeneous data, assuming a single Gaussian distribution. In practice, however, data may consist of gene expression studies with unknown confounding factors, such as study cohort, microarray platforms, experimental batches, which produce heterogeneous data, and hence lead to false positive edges or low detection power in resulting network, due to those unknown factors. To overcome this problem and improve the performance in constructing gene networks, we propose a two-stage approach to construct a gene network from heterogeneous data. The first stage is to perform a clustering analysis in order to assign samples to a few clusters where the samples in each cluster are approximately homogeneous, and the second stage is to conduct an integrative analysis of networks from each cluster. In particular, we first apply a model-based clustering method using the singular value decomposition for high-dimensional data, and then integrate the networks from each cluster using the integrative $\psi$-learning method. The proposed method is based on an equivalent measure of partial correlation coefficients in Gaussian graphical models, which is computed with a reduced conditional set and thus it is useful for high-dimensional data. We compare the proposed two-stage learning approach with some existing methods in various simulation settings, and demonstrate the robustness of the proposed method. Finally, it is applied to integrate multiple gene expression studies of lung adenocarcinoma to identify potential therapeutic targets and treatment biomarkers.


Assuntos
Bioestatística/métodos , Interpretação Estatística de Dados , Expressão Gênica , Redes Reguladoras de Genes , Genômica/métodos , Modelos Estatísticos , Análise por Conglomerados , Humanos , Neoplasias Pulmonares/genética
8.
Neural Comput ; 31(6): 1183-1214, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30979349

RESUMO

Bayesian networks have been widely used in many scientific fields for describing the conditional independence relationships for a large set of random variables. This letter proposes a novel algorithm, the so-called p-learning algorithm, for learning moral graphs for high-dimensional Bayesian networks. The moral graph is a Markov network representation of the Bayesian network and also the key to construction of the Bayesian network for constraint-based algorithms. The consistency of the p-learning algorithm is justified under the small-n, large-p scenario. The numerical results indicate that the p-learning algorithm significantly outperforms the existing ones, such as the PC, grow-shrink, incremental association, semi-interleaved hiton, hill-climbing, and max-min hill-climbing. Under the sparsity assumption, the p-learning algorithm has a computational complexity of O(p2) even in the worst case, while the existing algorithms have a computational complexity of O(p3) in the worst case.


Assuntos
Algoritmos , Teorema de Bayes , Redes Neurais de Computação , Humanos
9.
BMC Bioinformatics ; 18(1): 186, 2017 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-28335719

RESUMO

BACKGROUND: Gene regulatory networks reveal how genes work together to carry out their biological functions. Reconstructions of gene networks from gene expression data greatly facilitate our understanding of underlying biological mechanisms and provide new opportunities for biomarker and drug discoveries. In gene networks, a gene that has many interactions with other genes is called a hub gene, which usually plays an essential role in gene regulation and biological processes. In this study, we developed a method for reconstructing gene networks using a partial correlation-based approach that incorporates prior information about hub genes. Through simulation studies and two real-data examples, we compare the performance in estimating the network structures between the existing methods and the proposed method. RESULTS: In simulation studies, we show that the proposed strategy reduces errors in estimating network structures compared to the existing methods. When applied to Escherichia coli, the regulation network constructed by our proposed ESPACE method is more consistent with current biological knowledge than the SPACE method. Furthermore, application of the proposed method in lung cancer has identified hub genes whose mRNA expression predicts cancer progress and patient response to treatment. CONCLUSIONS: We have demonstrated that incorporating hub gene information in estimating network structures can improve the performance of the existing methods.


Assuntos
Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/genética , Humanos
10.
Biometrics ; 73(4): 1221-1230, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-28294287

RESUMO

In recent years, next generation sequencing (NGS) has gradually replaced microarray as the major platform in measuring gene expressions. Compared to microarray, NGS has many advantages, such as less noise and higher throughput. However, the discreteness of NGS data also challenges the existing statistical methodology. In particular, there still lacks an appropriate statistical method for reconstructing gene regulatory networks using NGS data in the literature. The existing local Poisson graphical model method is not consistent and can only infer certain local structures of the network. In this article, we propose a random effect model-based transformation to continuize NGS data and then we transform the continuized data to Gaussian via a semiparametric transformation and apply an equivalent partial correlation selection method to reconstruct gene regulatory networks. The proposed method is consistent. The numerical results indicate that the proposed method can lead to much more accurate inference of gene regulatory networks than the local Poisson graphical model and other existing methods. The proposed data-continuized transformation fills the theoretical gap for how to transform discrete data to continuous data and facilitates NGS data analysis. The proposed data-continuized transformation also makes it feasible to integrate different types of data, such as microarray and RNA-seq data, in reconstruction of gene regulatory networks.


Assuntos
Redes Reguladoras de Genes/genética , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Estatística como Assunto
11.
PLoS Genet ; 8(1): e1002482, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22291610

RESUMO

An important follow-up step after genetic markers are found to be associated with a disease outcome is a more detailed analysis investigating how the implicated gene or chromosomal region and an established environment risk factor interact to influence the disease risk. The standard approach to this study of gene-environment interaction considers one genetic marker at a time and therefore could misrepresent and underestimate the genetic contribution to the joint effect when one or more functional loci, some of which might not be genotyped, exist in the region and interact with the environment risk factor in a complex way. We develop a more global approach based on a Bayesian model that uses a latent genetic profile variable to capture all of the genetic variation in the entire targeted region and allows the environment effect to vary across different genetic profile categories. We also propose a resampling-based test derived from the developed Bayesian model for the detection of gene-environment interaction. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the Bayesian model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region, which contains a cluster of nicotinic acetylcholine receptor genes and has been shown to be associated with both lung cancer and smoking behavior. We find evidence for gene-environment interaction (P-value = 0.016), with the smoking effect appearing to be stronger in subjects with a genetic profile associated with a higher lung cancer risk; the conventional test of gene-environment interaction based on the single-marker approach is far from significant.


Assuntos
Teorema de Bayes , Cromossomos Humanos Par 15/genética , Doença/genética , Interação Gene-Ambiente , Neoplasias Pulmonares/genética , Receptores Nicotínicos/genética , Algoritmos , Simulação por Computador , Estudos de Associação Genética , Marcadores Genéticos , Genótipo , Humanos , Cadeias de Markov , Modelos Teóricos , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único , Fatores de Risco , Fumar
12.
Neural Netw ; 179: 106512, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39032394

RESUMO

Network embedding is a general-purpose machine learning technique that converts network data from non-Euclidean space to Euclidean space, facilitating downstream analyses for the networks. However, existing embedding methods are often optimization-based, with the embedding dimension determined in a heuristic or ad hoc way, which can cause potential bias in downstream statistical inference. Additionally, existing deep embedding methods can suffer from a nonidentifiability issue due to the universal approximation power of deep neural networks. We address these issues within a rigorous statistical framework. We treat the embedding vectors as missing data, reconstruct the network features using a sparse decoder, and simultaneously impute the embedding vectors and train the sparse decoder using an adaptive stochastic gradient Markov chain Monte Carlo (MCMC) algorithm. Under mild conditions, we show that the sparse decoder provides a parsimonious mapping from the embedding space to network features, enabling effective selection of the embedding dimension and overcoming the nonidentifiability issue encountered by existing deep embedding methods. Furthermore, we show that the embedding vectors converge weakly to a desired posterior distribution in the 2-Wasserstein distance, addressing the potential bias issue experienced by existing embedding methods. This work lays down the first theoretical foundation for network embedding within the framework of missing data imputation.


Assuntos
Algoritmos , Cadeias de Markov , Redes Neurais de Computação , Método de Monte Carlo , Aprendizado Profundo , Humanos , Aprendizado de Máquina
13.
Neural Comput ; 25(8): 2199-234, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23607562

RESUMO

Simulating from distributions with intractable normalizing constants has been a long-standing problem in machine learning. In this letter, we propose a new algorithm, the Monte Carlo Metropolis-Hastings (MCMH) algorithm, for tackling this problem. The MCMH algorithm is a Monte Carlo version of the Metropolis-Hastings algorithm. It replaces the unknown normalizing constant ratio by a Monte Carlo estimate in simulations, while still converges, as shown in the letter, to the desired target distribution under mild conditions. The MCMH algorithm is illustrated with spatial autologistic models and exponential random graph models. Unlike other auxiliary variable Markov chain Monte Carlo (MCMC) algorithms, such as the Møller and exchange algorithms, the MCMH algorithm avoids the requirement for perfect sampling, and thus can be applied to many statistical models for which perfect sampling is not available or very expensive. The MCMH algorithm can also be applied to Bayesian inference for random effect models and missing data problems that involve simulations from a distribution with intractable integrals.


Assuntos
Algoritmos , Inteligência Artificial , Processamento Eletrônico de Dados , Modelos Teóricos , Método de Monte Carlo , Teorema de Bayes , Simulação por Computador , Humanos , Neoplasias/mortalidade , Apoio Social
14.
J Comput Graph Stat ; 32(2): 448-469, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38240013

RESUMO

Inference for high-dimensional, large scale and long series dynamic systems is a challenging task in modern data science. The existing algorithms, such as particle filter or sequential importance sampler, do not scale well to the dimension of the system and the sample size of the dataset, and often suffers from the sample degeneracy issue for long series data. The recently proposed Langevinized ensemble Kalman filter (LEnKF) addresses these difficulties in a coherent way. However, it cannot be applied to the case that the dynamic system contains unknown parameters. This article proposes the so-called stochastic approximation-LEnKF for jointly estimating the states and unknown parameters of the dynamic system, where the parameters are estimated on the fly based on the state variables simulated by the LEnKF under the framework of stochastic approximation Markov chain Monte Carlo (MCMC). Under mild conditions, we prove its consistency in parameter estimation and ergodicity in state variable simulations. The proposed algorithm can be used in uncertainty quantification for long series, large scale, and high-dimensional dynamic systems. Numerical results indicate its superiority over the existing algorithms. We employ the proposed algorithm in state-space modeling of the sea surface temperature with a long short term memory (LSTM) network, which indicates its great potential in statistical analysis of complex dynamic systems encountered in modern data science. Supplementary materials for this article are available online.

15.
J Appl Stat ; 50(11-12): 2624-2647, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37529571

RESUMO

This paper proposes a dynamic infectious disease model for COVID-19 daily counts data and estimate the model using the Langevinized EnKF algorithm, which is scalable for large-scale spatio-temporal data, converges to the right filtering distribution, and is thus suitable for performing statistical inference and quantifying uncertainty for the underlying dynamic system. Under the framework of the proposed dynamic infectious disease model, we tested the impact of temperature, precipitation, state emergency order and stay home order on the spread of COVID-19 based on the United States county-wise daily counts data. Our numerical results show that warm and humid weather can significantly slow the spread of COVID-19, and the state emergency and stay home orders also help to slow it. This finding provides guidance and support to future policies or acts for mitigating the community transmission and lowering the mortality rate of COVID-19.

16.
Biostatistics ; 12(3): 582-93, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21209154

RESUMO

The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100-500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10( - 6)). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer.


Assuntos
Algoritmos , Interpretação Estatística de Dados , Método de Monte Carlo , Processos Estocásticos , Simulação por Computador , Estudo de Associação Genômica Ampla/métodos , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Neoplasias da Próstata/genética
17.
J Am Stat Assoc ; 117(540): 1981-1995, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36945326

RESUMO

Deep learning has been the engine powering many successes of data science. However, the deep neural network (DNN), as the basic model of deep learning, is often excessively over-parameterized, causing many difficulties in training, prediction and interpretation. We propose a frequentist-like method for learning sparse DNNs and justify its consistency under the Bayesian framework: the proposed method could learn a sparse DNN with at most O(n/log(n)) connections and nice theoretical guarantees such as posterior consistency, variable selection consistency and asymptotically optimal generalization bounds. In particular, we establish posterior consistency for the sparse DNN with a mixture Gaussian prior, show that the structure of the sparse DNN can be consistently determined using a Laplace approximation-based marginal posterior inclusion probability approach, and use Bayesian evidence to elicit sparse DNNs learned by an optimization method such as stochastic gradient descent in multiple runs with different initializations. The proposed method is computationally more efficient than standard Bayesian methods for large-scale sparse DNNs. The numerical results indicate that the proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection, both advancing interpretable machine learning.

18.
Bioinformatics ; 26(6): 777-83, 2010 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-20110277

RESUMO

MOTIVATION: Chromatin immunoprecipitation (ChIP) coupled with tiling microarray (chip) experiments have been used in a wide range of biological studies such as identification of transcription factor binding sites and investigation of DNA methylation and histone modification. Hidden Markov models are widely used to model the spatial dependency of ChIP-chip data. However, parameter estimation for these models is typically either heuristic or suboptimal, leading to inconsistencies in their applications. To overcome this limitation and to develop an efficient software, we propose a hidden ferromagnetic Ising model for ChIP-chip data analysis. RESULTS: We have developed a simple, but powerful Bayesian hierarchical model for ChIP-chip data via a hidden Ising model. Metropolis within Gibbs sampling algorithm is used to simulate from the posterior distribution of the model parameters. The proposed model naturally incorporates the spatial dependency of the data, and can be used to analyze data with various genomic resolutions and sample sizes. We illustrate the method using three publicly available datasets and various simulated datasets, and compare it with three closely related methods, namely TileMap HMM, tileHMM and BAC. We find that our method performs as well as TileMap HMM and BAC for the high-resolution data from Affymetrix platform, but significantly outperforms the other three methods for the low-resolution data from Agilent platform. Compared with the BAC method which also involves MCMC simulations, our method is computationally much more efficient. AVAILABILITY: A software called iChip is freely available at http://www.bioconductor.org/. CONTACT: moq@mskcc.org.


Assuntos
Imunoprecipitação da Cromatina/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Genômica/métodos
19.
BMC Med Genet ; 12: 48, 2011 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-21457555

RESUMO

BACKGROUND: BCL-2 (B-cell leukemia/lymphoma 2) gene has been demonstrated to be associated with breast cancer development and a single nucleotide polymorphism (SNP; -938C > A) has been identified recently. To investigate whether this polymorphism functions as a modifier of breast cancer development, we analyzed the distribution of genotype frequency, as well as the association of genotype with clinicopathological characteristics. Furthermore, we also studied the effects of this SNP on Bcl-2 expression in vitro. METHODS: We genotyped the BCL-2 (-938C > A) in 114 patients and 107 controls, and analyzed the estrogen receptor (ER), progestogen receptor (PR), C-erbB2 and Ki67 status with immunohistochemistry (IHC). Different Bcl-2 protein levels in breast cancer cell lines were determined using western blot. Logistic regression model was applied in statistical analysis. RESULTS: We found that homozygous AA genotype was associated with an increased risk (AA vs AC+CC) by 2.37-fold for breast cancer development and significant association was observed between nodal status and different genotypes of BCL-2 (-938C > A) (p = 0.014). AA genotype was more likely to develop into lobular breast cancer (p = 0.036). The result of western blot analysis indicated that allele A was associated with the lower level of Bcl-2 expression in breast cancer cell lines. CONCLUSIONS: AA genotype of BCL-2 (-938C > A) is associated with susceptibility of breast cancer, and this genotype is only associated with the nodal status and pathological diagnosis of breast cancer. The polymorphism has an effect on Bcl-2 expression but needs further investigation.


Assuntos
Biomarcadores Tumorais/análise , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Genes bcl-2 , Polimorfismo de Nucleotídeo Único , Proteínas Proto-Oncogênicas c-bcl-2/genética , Adulto , Idoso , Alanina , Western Blotting , Neoplasias da Mama/química , Estudos de Casos e Controles , Linhagem Celular Tumoral , Cisteína , Feminino , Regulação Neoplásica da Expressão Gênica , Predisposição Genética para Doença , Genótipo , Humanos , Imuno-Histoquímica , Modelos Logísticos , Metástase Linfática/genética , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase , Polimorfismo de Fragmento de Restrição , Medição de Risco , Fatores de Risco
20.
Biometrics ; 66(4): 1284-94, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20128774

RESUMO

ChIP-chip experiments are procedures that combine chromatin immunoprecipitation (ChIP) and DNA microarray (chip) technology to study a variety of biological problems, including protein-DNA interaction, histone modification, and DNA methylation. The most important feature of ChIP-chip data is that the intensity measurements of probes are spatially correlated because the DNA fragments are hybridized to neighboring probes in the experiments. We propose a simple, but powerful Bayesian hierarchical approach to ChIP-chip data through an Ising model with high-order interactions. The proposed method naturally takes into account the intrinsic spatial structure of the data and can be used to analyze data from multiple platforms with different genomic resolutions. The model parameters are estimated using the Gibbs sampler. The proposed method is illustrated using two publicly available data sets from Affymetrix and Agilent platforms, and compared with three alternative Bayesian methods, namely, Bayesian hierarchical model, hierarchical gamma mixture model, and Tilemap hidden Markov model. The numerical results indicate that the proposed method performs as well as the other three methods for the data from Affymetrix tiling arrays, but significantly outperforms the other three methods for the data from Agilent promoter arrays. In addition, we find that the proposed method has better operating characteristics in terms of sensitivities and false discovery rates under various scenarios.


Assuntos
Teorema de Bayes , Imunoprecipitação da Cromatina/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Humanos , Métodos , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa