Accounting for network noise in graph-guided Bayesian modeling of structured high-dimensional data.

Li, Wenrui; Chang, Changgee; Kundu, Suprateek; Long, Qi

Li, Wenrui; Chang, Changgee; Kundu, Suprateek; Long, Qi.

Afiliación

Li W; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, PA 19104, United States.
Chang C; Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, United States.
Kundu S; Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States.
Long Q; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, PA 19104, United States.

Biometrics ; 80(1)2024 Jan 29.

Article en En | MEDLINE | ID: mdl-38483282

ABSTRACT

ABSTRACT

There is a growing body of literature on knowledge-guided statistical learning methods for analysis of structured high-dimensional data (such as genomic and transcriptomic data) that can incorporate knowledge of underlying networks derived from functional genomics and functional proteomics. These methods have been shown to improve variable selection and prediction accuracy and yield more interpretable results. However, these methods typically use graphs extracted from existing databases or rely on subject matter expertise, which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian modeling framework to account for network noise in regression models involving structured high-dimensional predictors. Specifically, we use 2 sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed predictors in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian regression model with structured high-dimensional predictors involving an adaptive structured shrinkage prior. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of a genomics dataset and another proteomics dataset for Alzheimer's disease.

Asunto(s)

Enfermedad de Alzheimer; Genómica; Humanos; Teorema de Bayes; Algoritmos; Enfermedad de Alzheimer/genética; Bases de Datos Factuales

Palabras clave

MCMC algorithm; adaptive Bayesian shrinkage; latent scale network model; noisy graph; structured high-dimensional prediction

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Genómica / Enfermedad de Alzheimer Límite: Humans Idioma: En Revista: Biometrics Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google