Búsqueda | Portal Regional de la BVS

Negative binomial factor regression with application to microbiome data analysis.

Mishra, Aditya K; Müller, Christian L.

Stat Med ; 41(15): 2786-2803, 2022 07 10.

Artículo en Inglés | MEDLINE | ID: mdl-35466418

RESUMEN

The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host-microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host-associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host-related features and amplicon-derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose negative binomial reduced rank regression (NB-RRR) and negative binomial co-sparse factor regression (NB-FAR). While NB-RRR encodes the underlying dependency among the microbial abundances as outcomes and the host-associated features as predictors through a rank-constrained coefficient matrix, NB-FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit-rank components of the coefficient matrix sequentially, effectively delivering interpretable bi-clusters of taxa and host-associated factors. To solve the nonconvex optimization problems associated with these factor regression models, we present a novel iterative block-wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project (AGP) demonstrate the efficacy of the proposed procedure. In the AGP data, we identify several factors that strongly link dietary habits and host life style to specific microbial families.

Asunto(s)

Análisis de Datos , Microbiota , Análisis Factorial , Conducta Alimentaria , Microbioma Gastrointestinal , Humanos , Estilo de Vida , Análisis de Regresión , Estados Unidos

SOFAR: Large-Scale Association Network Learning.

Uematsu, Yoshimasa; Fan, Yingying; Chen, Kun; Lv, Jinchi; Lin, Wei.

IEEE Trans Inf Theory ; 65(8): 4924-4939, 2019 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-33746241

RESUMEN

Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulations and real data examples.

Principal bicorrelation analysis: Unraveling associations between three data sources.

Mattiello, Federico; Thas, Olivier; Verbist, Bie.

J Biopharm Stat ; 26(3): 534-51, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-26098298

RESUMEN

In this article, we propose a statistical explorative method for data integration. It is developed in the context of early drug development for which it enables the detection of chemical substructures and the identification of genes that mediate their association with the bioactivity (BA). The core of the method is a sparse singular value decomposition for the identification of the gene set and a permutation-based method for the control of the false discovery rate. The method is illustrated using a real dataset, and its properties are empirically evaluated by means of a simulation study. Quantitative Structure Transcriptional Activity Relationship (QSTAR, www.qstar-consortium.org ) is a new paradigm in early drug development that extends QSAR by not only considering data on the chemical structure of the compounds and on the compound-induced BA, but by simultaneously using transcriptomics data (gene expression). This approach enables, for example, the detection of chemical substructures that are associated with BA, while at the same time a gene set is correlated with both these substructures and the BA. Although causal associations cannot be formally concluded, these associations may suggest that the compounds act on the BA through a particular genomic pathway.

Asunto(s)

Diseño de Fármacos , Perfilación de la Expresión Génica , Relación Estructura-Actividad Cuantitativa , Simulación por Computador , Interpretación Estadística de Datos , Expresión Génica

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA