Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Stat Med ; 43(20): 3830-3861, 2024 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-38922944

RESUMEN

The brain functional connectivity can typically be represented as a brain functional network, where nodes represent regions of interest (ROIs) and edges symbolize their connections. Studying group differences in brain functional connectivity can help identify brain regions and recover the brain functional network linked to neurodegenerative diseases. This process, known as differential network analysis focuses on the differences between estimated precision matrices for two groups. Current methods struggle with individual heterogeneity in measuring the brain connectivity, false discovery rate (FDR) control, and accounting for confounding factors, resulting in biased estimates and diminished power. To address these issues, we present a two-stage FDR-controlled feature selection method for differential network analysis using functional magnetic resonance imaging (fMRI) data. First, we create individual brain connectivity measures using a high-dimensional precision matrix estimation technique. Next, we devise a penalized logistic regression model that employs individual brain connectivity data and integrates a new knockoff filter for FDR control when detecting significant differential edges. Through extensive simulations, we showcase the superiority of our approach compared to other methods. Additionally, we apply our technique to fMRI data to identify differential edges between Alzheimer's disease and control groups. Our results are consistent with prior experimental studies, emphasizing the practical applicability of our method.


Asunto(s)
Enfermedad de Alzheimer , Encéfalo , Imagen por Resonancia Magnética , Humanos , Imagen por Resonancia Magnética/métodos , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Simulación por Computador , Modelos Logísticos , Red Nerviosa/diagnóstico por imagen , Red Nerviosa/fisiología , Conectoma/métodos
2.
Entropy (Basel) ; 26(8)2024 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-39202133

RESUMEN

The Kullback-Leibler divergence is a measure of the divergence between two probability distributions, often used in statistics and information theory. However, exact expressions for it are not known for multivariate or matrix-variate distributions apart from a few cases. In this paper, exact expressions for the Kullback-Leibler divergence are derived for over twenty multivariate and matrix-variate distributions. The expressions involve various special functions.

3.
Stat Med ; 42(20): 3616-3635, 2023 09 10.
Artículo en Inglés | MEDLINE | ID: mdl-37314066

RESUMEN

Motivated by diagnosing the COVID-19 disease using two-dimensional (2D) image biomarkers from computed tomography (CT) scans, we propose a novel latent matrix-factor regression model to predict responses that may come from an exponential distribution family, where covariates include high-dimensional matrix-variate biomarkers. A latent generalized matrix regression (LaGMaR) is formulated, where the latent predictor is a low-dimensional matrix factor score extracted from the low-rank signal of the matrix variate through a cutting-edge matrix factor model. Unlike the general spirit of penalizing vectorization plus the necessity of tuning parameters in the literature, instead, our prediction modeling in LaGMaR conducts dimension reduction that respects the geometric characteristic of intrinsic 2D structure of the matrix covariate and thus avoids iteration. This greatly relieves the computation burden, and meanwhile maintains structural information so that the latent matrix factor feature can perfectly replace the intractable matrix-variate owing to high-dimensionality. The estimation procedure of LaGMaR is subtly derived by transforming the bilinear form matrix factor model onto a high-dimensional vector factor model, so that the method of principle components can be applied. We establish bilinear-form consistency of the estimated matrix coefficient of the latent predictor and consistency of prediction. The proposed approach can be implemented conveniently. Through simulation experiments, the prediction capability of LaGMaR is shown to outperform some existing penalized methods under diverse scenarios of generalized matrix regressions. Through the application to a real COVID-19 dataset, the proposed approach is shown to predict efficiently the COVID-19.


Asunto(s)
COVID-19 , Humanos , Simulación por Computador , Biomarcadores
4.
Biostatistics ; 22(2): 402-420, 2021 04 10.
Artículo en Inglés | MEDLINE | ID: mdl-31631218

RESUMEN

Inferring brain connectivity network and quantifying the significance of interactions between brain regions are of paramount importance in neuroscience. Although there have recently emerged some tests for graph inference based on independent samples, there is no readily available solution to test the change of brain network for paired and correlated samples. In this article, we develop a paired test of matrix graphs to infer brain connectivity network when the groups of samples are correlated. The proposed test statistic is both bias corrected and variance corrected, and achieves a small estimation error rate. The subsequent multiple testing procedure built on this test statistic is guaranteed to asymptotically control the false discovery rate at the pre-specified level. Both the methodology and theory of the new test are considerably different from the two independent samples framework, owing to the strong correlations of measurements on the same subjects before and after the stimulus activity. We illustrate the efficacy of our proposal through simulations and an analysis of an Alzheimer's Disease Neuroimaging Initiative dataset.


Asunto(s)
Enfermedad de Alzheimer , Imagen por Resonancia Magnética , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Humanos , Neuroimagen
5.
Biostatistics ; 21(2): e80-e97, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30371748

RESUMEN

Epidemiological studies on periodontal disease (PD) collect relevant bio-markers, such as the clinical attachment level (CAL) and the probed pocket depth (PPD), at pre-specified tooth sites clustered within a subject's mouth, along with various other demographic and biological risk factors. Routine cross-sectional evaluation are conducted under a linear mixed model (LMM) framework with underlying normality assumptions on the random terms. However, a careful investigation reveals considerable non-normality manifested in those random terms, in the form of skewness and tail behavior. In addition, PD progression is hypothesized to be spatially-referenced, i.e. disease status at proximal tooth-sites may be different from distally located sites, and tooth missingness is non-random (or informative), given that the number and location of missing teeth informs about the periodontal health in that region. To mitigate these complexities, we consider a matrix-variate skew-$t$ formulation of the LMM with a Markov graphical embedding to handle the site-level spatial associations of the bivariate (PPD and CAL) responses. Within the same framework, the non-randomly missing responses are imputed via a latent probit regression of the missingness indicator over the responses. Our hierarchical Bayesian framework powered by relevant Markov chain Monte Carlo steps addresses the aforementioned complexities within an unified paradigm, and estimates model parameters with seamless sharing of information across various stages of the hierarchy. Using both synthetic and real clinical data assessing PD status, we demonstrate a significantly improved fit of our proposition over various other alternative models.


Asunto(s)
Bioestadística/métodos , Modelos Estadísticos , Simulación por Computador , Humanos , Enfermedades Periodontales/epidemiología
6.
Entropy (Basel) ; 23(6)2021 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-34203893

RESUMEN

In physics, communication theory, engineering, statistics, and other areas, one of the methods of deriving distributions is the optimization of an appropriate measure of entropy under relevant constraints. In this paper, it is shown that by optimizing a measure of entropy introduced by the second author, one can derive densities of univariate, multivariate, and matrix-variate distributions in the real, as well as complex, domain. Several such scalar, multivariate, and matrix-variate distributions are derived. These include multivariate and matrix-variate Maxwell-Boltzmann and Rayleigh densities in the real and complex domains, multivariate Student-t, Cauchy, matrix-variate type-1 beta, type-2 beta, and gamma densities and their generalizations.

7.
Biostatistics ; 18(2): 214-229, 2017 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-27578805

RESUMEN

Many modern neuroimaging studies acquire large spatial images of the brain observed sequentially over time. Such data are often stored in the forms of matrices. To model these matrix-variate data we introduce a class of separable processes using explicit latent process modeling. To account for the size and two-way structure of the data, we extend principal component analysis to achieve dimensionality reduction at the individual level. We introduce necessary identifiability conditions for each model and develop scalable estimation procedures. The method is motivated by and applied to a functional magnetic resonance imaging study designed to analyze the relationship between pain and brain activity.


Asunto(s)
Mapeo Encefálico/métodos , Imagen por Resonancia Magnética/métodos , Análisis de Componente Principal , Humanos
8.
Biometrics ; 73(3): 780-791, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-27959470

RESUMEN

Brain connectivity analysis is now at the foreground of neuroscience research. A connectivity network is characterized by a graph, where nodes represent neural elements such as neurons and brain regions, and links represent statistical dependence that is often encoded in terms of partial correlation. Such a graph is inferred from the matrix-valued neuroimaging data such as electroencephalography and functional magnetic resonance imaging. There have been a good number of successful proposals for sparse precision matrix estimation under normal or matrix normal distribution; however, this family of solutions does not offer a direct statistical significance quantification for the estimated links. In this article, we adopt a matrix normal distribution framework and formulate the brain connectivity analysis as a precision matrix hypothesis testing problem. Based on the separable spatial-temporal dependence structure, we develop oracle and data-driven procedures to test both the global hypothesis that all spatial locations are conditionally independent, and simultaneous tests for identifying conditional dependent spatial locations with false discovery rate control. Our theoretical results show that the data-driven procedures perform asymptotically as well as the oracle procedures and enjoy certain optimality properties. The empirical finite-sample performance of the proposed tests is studied via intensive simulations, and the new tests are applied on a real electroencephalography data analysis.


Asunto(s)
Encéfalo , Electroencefalografía , Humanos , Imagen por Resonancia Magnética , Modelos Neurológicos
9.
J Appl Stat ; 51(10): 2025-2038, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39071246

RESUMEN

Recently, two-way or longitudinal functional data analysis has attracted much attention in many fields. However, little is known on how to appropriately characterize the association between two-way functional predictor and scalar response. Motivated by a mortality study, in this paper, we propose a novel two-way functional linear model, where the response is a scalar and functional predictor is two-way trajectory. The model is intuitive, interpretable and naturally captures relationship between each way of two-way functional predictor and scalar-type response. Further, we develop a new estimation method to estimate the regression functions in the framework of weak separability. The main technical tools for the construction of the regression functions are product functional principal component analysis and iterative least square procedure. The solid performance of our method is demonstrated in extensive simulation studies. We also analyze the mortality dataset to illustrate the usefulness of the proposed procedure.

10.
Adv Data Anal Classif ; 17(2): 323-345, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-35529071

RESUMEN

The nonparametric formulation of density-based clustering, known as modal clustering, draws a correspondence between groups and the attraction domains of the modes of the density function underlying the data. Its probabilistic foundation allows for a natural, yet not trivial, generalization of the approach to the matrix-valued setting, increasingly widespread, for example, in longitudinal and multivariate spatio-temporal studies. In this work we introduce nonparametric estimators of matrix-variate distributions based on kernel methods, and analyze their asymptotic properties. Additionally, we propose a generalization of the mean-shift procedure for the identification of the modes of the estimated density. Given the intrinsic high dimensionality of matrix-variate data, we discuss some locally adaptive solutions to handle the problem. We test the procedure via extensive simulations, also with respect to some competitors, and illustrate its performance through two high-dimensional real data applications.

11.
Stat Comput ; 32(3): 53, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35730052

RESUMEN

Hidden Markov models (HMMs) have been extensively used in the univariate and multivariate literature. However, there has been an increased interest in the analysis of matrix-variate data over the recent years. In this manuscript we introduce HMMs for matrix-variate balanced longitudinal data, by assuming a matrix normal distribution in each hidden state. Such data are arranged in a four-way array. To address for possible overparameterization issues, we consider the eigen decomposition of the covariance matrices, leading to a total of 98 HMMs. An expectation-conditional maximization algorithm is discussed for parameter estimation. The proposed models are firstly investigated on simulated data, in terms of parameter recovery, computational times and model selection. Then, they are fitted to a four-way real data set concerning the unemployment rates of the Italian provinces, evaluated by gender and age classes, over the last 16 years.

12.
Front Comput Neurosci ; 16: 1046310, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36387303

RESUMEN

Brain function networks (BFN) are widely used in the diagnosis of electroencephalography (EEG)-based major depressive disorder (MDD). Typically, a BFN is constructed by calculating the functional connectivity (FC) between each pair of channels. However, it ignores high-order relationships (e.g., relationships among multiple channels), making it a low-order network. To address this issue, a novel classification framework, based on matrix variate normal distribution (MVND), is proposed in this study. The framework can simultaneously generate high-and low-order BFN and has a distinct mathematical interpretation. Specifically, the entire time series is first divided into multiple epochs. For each epoch, a BFN is constructed by calculating the phase lag index (PLI) between different EEG channels. The BFNs are then used as samples, maximizing the likelihood of MVND to simultaneously estimate its low-order BFN (Lo-BFN) and high-order BFN (Ho-BFN). In addition, to solve the problem of the excessively high dimensionality of Ho-BFN, Kronecker product decomposition is used for dimensionality reduction while retaining the original high-order information. The experimental results verified the effectiveness of Ho-BFN for MDD diagnosis in 24 patients and 24 normal controls. We further investigated the selected discriminative Lo-BFN and Ho-BFN features and revealed that those extracted from different networks can provide complementary information, which is beneficial for MDD diagnosis.

13.
Front Neurosci ; 16: 872848, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35573311

RESUMEN

Brain functional network (BFN) has become an increasingly important tool to understand the inherent organization of the brain and explore informative biomarkers of neurological disorders. Pearson's correlation (PC) is the most widely accepted method for constructing BFNs and provides a basis for designing new BFN estimation schemes. Particularly, a recent study proposes to use two sequential PC operations, namely, correlation's correlation (CC), for constructing the high-order BFN. Despite its empirical effectiveness in identifying neurological disorders and detecting subtle changes of connections in different subject groups, CC is defined intuitively without a solid and sustainable theoretical foundation. For understanding CC more rigorously and providing a systematic BFN learning framework, in this paper, we reformulate it in the Bayesian view with a prior of matrix-variate normal distribution. As a result, we obtain a probabilistic explanation of CC. In addition, we develop a Bayesian high-order method (BHM) to automatically and simultaneously estimate the high- and low-order BFN based on the probabilistic framework. An efficient optimization algorithm is also proposed. Finally, we evaluate BHM in identifying subjects with autism spectrum disorder (ASD) from typical controls based on the estimated BFNs. Experimental results suggest that the automatically learned high- and low-order BFNs yield a superior performance over the artificially defined BFNs via conventional CC and PC.

14.
Front Neurosci ; 15: 810431, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35221892

RESUMEN

Functional connectivity network (FCN) calculated by resting-state functional magnetic resonance imaging (rs-fMRI) plays an increasingly important role in the exploration of neurologic and mental diseases. Among the presented researches, the method of constructing FCN based on Matrix Variate Normal Distribution (MVND) theory provides a novel perspective, which can capture both low- and high-order correlations simultaneously with a clear mathematical interpretability. However, when fitting MVND model, the dimension of the parameters (i.e., population mean and population covariance) to be estimated is too high, but the number of samples is relatively quite small, which is insufficient to achieve accurate fitting. To address the issue, we divide the brain network into several sub-networks, and then the MVND based FCN construction algorithm is implemented in each sub-network, thus the spatial dimension of MVND is reduced and more accurate estimates of low- and high-order FCNs is obtained. Furthermore, for making up the functional connectivity which is lost because of the sub-network division, the rs-fMRI mean series of all sub-networks are calculated, and the low- and high-order FCN across sub-networks are estimated with the MVND based FCN construction method. In order to prove the superiority and effectiveness of this method, we design and conduct classification experiments on ASD patients and normal controls. The experimental results show that the classification accuracy of "hierarchical sub-network method" is greatly improved, and the sub-network found most related to ASD in our experiment is consistent with other related medical researches.

15.
Front Artif Intell ; 4: 674166, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34056581

RESUMEN

Networks represent a useful tool to describe relationships among financial firms and network analysis has been extensively used in recent years to study financial connectedness. An aspect, which is often neglected, is that network observations come with errors from different sources, such as estimation and measurement errors, thus a proper statistical treatment of the data is needed before network analysis can be performed. We show that node centrality measures can be heavily affected by random errors and propose a flexible model based on the matrix-variate t distribution and a Bayesian inference procedure to de-noise the data. We provide an application to a network among European financial institutions.

16.
J Appl Stat ; 47(10): 1739-1756, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-35707136

RESUMEN

We consider the clustering of repeatedly measured 'min-max' type interval-valued data. We read the data as matrix variate data and assume the covariance matrix is separable for the model-based clustering (M-clustering). The use of a separable covariance matrix introduces several advantages in M-clustering, which include fewer samples required for a valid procedure. In addition, the numerical study shows that this structured matrix allows us to find the correct number of clusters more accurately compared to other commonly assumed covariance matrices. We apply the M-clustering with various covariance structures to clustering the longitudinal blood pressure data from the National Heart, Lung, and Blood Institute Growth and Health Study (NGHS).

17.
Front Neuroinform ; 12: 3, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29467643

RESUMEN

Functional connectivity (FC) network has been becoming an increasingly useful tool for understanding the cerebral working mechanism and mining sensitive biomarkers for neural/mental disease diagnosis. Currently, Pearson's Correlation (PC) is the simplest and most commonly used scheme in FC estimation. Despite its empirical effectiveness, PC only encodes the low-order (i.e., second-order) statistics by calculating the pairwise correlations between network nodes (brain regions), which fails to capture the high-order information involved in FC (e.g., the correlations among different edges in a network). To address this issue, we propose a novel FC estimation method based on Matrix Variate Normal Distribution (MVND), which can capture both low- and high-order correlations simultaneously with a clear mathematical interpretability. Specifically, we first generate a set of BOLD subseries by the sliding window scheme, and for each subseries we construct a temporal FC network by PC. Then, we employ the constructed FC networks as samples to estimate the final low- and high-order FC networks by maximizing the likelihood of MVND. To illustrate the effectiveness of the proposed method, we conduct experiments to identify subjects with Mild Cognitive Impairment (MCI) from Normal Controls (NCs). Experimental results show that the fusion of low- and high-order FCs can generally help to improve the final classification performance, even though the high-order FC may contain less discriminative information than its low-order counterpart. Importantly, the proposed method for simultaneous estimation of low- and high-order FCs can achieve better classification performance than the two baseline methods, i.e., the original PC method and a recent high-order FC estimation method.

18.
J Comput Graph Stat ; 23(4): 985-1008, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25364221

RESUMEN

We consider the task of simultaneously clustering the rows and columns of a large transposable data matrix. We assume that the matrix elements are normally distributed with a bicluster-specific mean term and a common variance, and perform biclustering by maximizing the corresponding log likelihood. We apply an ℓ1 penalty to the means of the biclusters in order to obtain sparse and interpretable biclusters. Our proposal amounts to a sparse, symmetrized version of k-means clustering. We show that k-means clustering of the rows and of the columns of a data matrix can be seen as special cases of our proposal, and that a relaxation of our proposal yields the singular value decomposition. In addition, we propose a framework for bi-clustering based on the matrix-variate normal distribution. The performances of our proposals are demonstrated in a simulation study and on a gene expression data set. This article has supplementary material online.

19.
J R Stat Soc Series B Stat Methodol ; 74(4): 721-743, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34880705

RESUMEN

We consider the problem of large-scale inference on the row or column variables of data in the form of a matrix. Many of these data matrices are transposable meaning that neither the row variables nor the column variables can be considered independent instances. An example of this scenario is detecting significant genes in microarrays when the samples may be dependent due to latent variables or unknown batch effects. By modeling this matrix data using the matrix-variate normal distribution, we study and quantify the effects of row and column correlations on procedures for large-scale inference. We then propose a simple solution to the myriad of problems presented by unanticipated correlations: We simultaneously estimate row and column covariances and use these to sphere or de-correlate the noise in the underlying data before conducting inference. This procedure yields data with approximately independent rows and columns so that test statistics more closely follow null distributions and multiple testing procedures correctly control the desired error rates. Results on simulated models and real microarray data demonstrate major advantages of this approach: (1) increased statistical power, (2) less bias in estimating the false discovery rate, and (3) reduced variance of the false discovery rate estimators.

20.
Ann Appl Stat ; 4(2): 764-790, 2010 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26877823

RESUMEN

Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA