Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Stat Sin ; 34(2): 527-546, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38655129

ABSTRACT

Multi-modal data are prevalent in many scientific fields. In this study, we consider the parameter estimation and variable selection for a multi-response regression using block-missing multi-modal data. Our method allows the dimensions of both the responses and the predictors to be large, and the responses to be incomplete and correlated, a common practical problem in high-dimensional settings. Our proposed method uses two steps to make a prediction from a multi-response linear regression model with block-missing multi-modal predictors. In the first step, without imputing missing data, we use all available data to estimate the covariance matrix of the predictors and the cross-covariance matrix between the predictors and the responses. In the second step, we use these matrices and a penalized method to simultaneously estimate the precision matrix of the response vector, given the predictors, and the sparse regression parameter matrix. Lastly, we demonstrate the effectiveness of the proposed method using theoretical studies, simulated examples, and an analysis of a multi-modal imaging data set from the Alzheimer's Disease Neuroimaging Initiative.

2.
Sensors (Basel) ; 23(1)2022 Dec 22.
Article in English | MEDLINE | ID: mdl-36616709

ABSTRACT

Online multi-microphone speech enhancement aims to extract target speech from multiple noisy inputs by exploiting the spatial information as well as the spectro-temporal characteristics with low latency. Acoustic parameters such as the acoustic transfer function and speech and noise spatial covariance matrices (SCMs) should be estimated in a causal manner to enable the online estimation of the clean speech spectra. In this paper, we propose an improved estimator for the speech SCM, which can be parameterized with the speech power spectral density (PSD) and relative transfer function (RTF). Specifically, we adopt the temporal cepstrum smoothing (TCS) scheme to estimate the speech PSD, which is conventionally estimated with temporal smoothing. Furthermore, we propose a novel RTF estimator based on a time difference of arrival (TDoA) estimate obtained by the cross-correlation method. Furthermore, we propose refining the initial estimate of speech SCM by utilizing the estimates for the clean speech spectrum and clean speech power spectrum. The proposed approach showed superior performance in terms of the perceptual evaluation of speech quality (PESQ) scores, extended short-time objective intelligibility (eSTOI), and scale-invariant signal-to-distortion ratio (SISDR) in our experiments on the CHiME-4 database.


Subject(s)
Speech Perception , Speech , Noise , Acoustics
3.
Entropy (Basel) ; 25(1)2022 Dec 27.
Article in English | MEDLINE | ID: mdl-36673194

ABSTRACT

This paper tackles the problem of estimating the covariance matrix in large-dimension and small-sample-size scenarios. Inspired by the well-known linear shrinkage estimation, we propose a novel second-order Stein-type regularization strategy to generate well-conditioned covariance matrix estimators. We model the second-order Stein-type regularization as a quadratic polynomial concerning the sample covariance matrix and a given target matrix, representing the prior information of the actual covariance structure. To obtain available covariance matrix estimators, we choose the spherical and diagonal target matrices and develop unbiased estimates of the theoretical mean squared errors, which measure the distances between the actual covariance matrix and its estimators. We formulate the second-order Stein-type regularization as a convex optimization problem, resulting in the optimal second-order Stein-type estimators. Numerical simulations reveal that the proposed estimators can significantly lower the Frobenius losses compared with the existing Stein-type estimators. Moreover, a real data analysis in portfolio selection verifies the performance of the proposed estimators.

4.
J Stat Plan Inference ; 213: 16-32, 2021 Jul.
Article in English | MEDLINE | ID: mdl-33281277

ABSTRACT

We introduce an estimation method of covariance matrices in a high-dimensional setting, i.e., when the dimension of the matrix, p, is larger than the sample size n. Specifically, we propose an orthogonally equivariant estimator. The eigenvectors of such estimator are the same as those of the sample covariance matrix. The eigenvalue estimates are obtained from an adjusted profile likelihood function derived by approximating the integral of the density function of the sample covariance matrix over its eigenvectors, which is a challenging problem in its own right. Exact solutions to the approximate likelihood equations are obtained and employed to construct estimates that involve a tuning parameter. Bootstrap and cross-validation based algorithms are proposed to choose this tuning parameter under various loss functions. Finally, comparisons with two well-known orthogonally equivariant estimators are given, which are based on Monte-Carlo risk estimates for simulated data and misclassification errors in real data analyses.

5.
J Econom ; 215(1): 118-130, 2020 Mar.
Article in English | MEDLINE | ID: mdl-32773919

ABSTRACT

This paper develops a new estimation procedure for ultrahigh dimensional sparse precision matrix, the inverse of covariance matrix. Regularization methods have been proposed for sparse precision matrix estimation, but they may not perform well with ultrahigh dimensional data due to the spurious correlation. We propose a refitted cross validation (RCV) method for sparse precision matrix estimation based on its Cholesky decomposition, which does not require the Gaussian assumption. The proposed RCV procedure can be easily implemented with existing software for ultrahigh dimensional linear regression. We establish the consistency of the proposed RCV estimation and show that the rate of convergence of the RCV estimation without assuming banded structure is the same as that of those assuming the banded structure in Bickel and Levina (2008b). Monte Carlo studies were conducted to access the finite sample performance of the RCV estimation. Our numerical comparison shows that the RCV estimation outperforms the existing ones in various scenarios. We further apply the RCV estimation for an empirical analysis of asset allocation.

6.
Entropy (Basel) ; 20(4)2018 Apr 08.
Article in English | MEDLINE | ID: mdl-33265349

ABSTRACT

This paper presents a covariance matrix estimation method based on information geometry in a heterogeneous clutter. In particular, the problem of covariance estimation is reformulated as the computation of geometric median for covariance matrices estimated by the secondary data set. A new class of total Bregman divergence is presented on the Riemanian manifold of Hermitian positive-definite (HPD) matrix, which is the foundation of information geometry. On the basis of this divergence, total Bregman divergence medians are derived instead of the sample covariance matrix (SCM) of the secondary data. Unlike the SCM, resorting to the knowledge of statistical characteristics of the sample data, the geometric structure of matrix space is considered in our proposed estimators, and then the performance can be improved in a heterogeneous clutter. At the analysis stage, numerical results are given to validate the detection performance of an adaptive normalized matched filter with our estimator compared with existing alternatives.

7.
J Biopharm Stat ; 27(3): 387-398, 2017.
Article in English | MEDLINE | ID: mdl-28281937

ABSTRACT

Dichotomous endpoints in clinical trials have only two possible outcomes, either directly or via categorization of an ordinal or continuous observation. It is common to have missing data for one or more visits during a multi-visit study. This paper presents a closed form method for sensitivity analysis of a randomized multi-visit clinical trial that possibly has missing not at random (MNAR) dichotomous data. Counts of missing data are redistributed to the favorable and unfavorable outcomes mathematically to address possibly informative missing data. Adjusted proportion estimates and their closed form covariance matrix estimates are provided. Treatment comparisons over time are addressed with Mantel-Haenszel adjustment for a stratification factor and/or randomization-based adjustment for baseline covariables. The application of such sensitivity analyses is illustrated with an example. An appendix outlines an extension of the methodology to ordinal endpoints.


Subject(s)
Data Interpretation, Statistical , Randomized Controlled Trials as Topic , Research Design , Data Accuracy , Endpoint Determination , Humans , Sensitivity and Specificity
8.
J Comput Graph Stat ; 32(2): 601-612, 2023.
Article in English | MEDLINE | ID: mdl-37273839

ABSTRACT

The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional regimes do not admit such a convenience. Thus, a variety of estimators have been derived to overcome the shortcomings of the canonical estimator in such settings. Yet, selecting an optimal estimator from among the plethora available remains an open challenge. Using the framework of cross-validated loss-based estimation, we develop the theoretical underpinnings of just such an estimator selection procedure. We propose a general class of loss functions for covariance matrix estimation and establish accompanying finite-sample risk bounds and conditions for the asymptotic optimality of the cross-validation selector. In numerical experiments, we demonstrate the optimality of our proposed selector in moderate sample sizes and across diverse data-generating processes. The practical benefits of our procedure are highlighted in a dimension reduction application to single-cell transcriptome sequencing data.

9.
Electron J Stat ; 15(2): 4192-4235, 2021.
Article in English | MEDLINE | ID: mdl-35782590

ABSTRACT

This manuscript presents an approach to perform generalized linear regression with multiple high dimensional covariance matrices as the outcome. In many areas of study, such as resting-state functional magnetic resonance imaging (fMRI) studies, this type of regression can be utilized to characterize variation in the covariance matrices across units. Model parameters are estimated by maximizing a likelihood formulation of a generalized linear model, conditioning on a well-conditioned linear shrinkage estimator for multiple covariance matrices, where the shrinkage coefficients are proposed to be shared across matrices. Theoretical studies demonstrate that the proposed covariance matrix estimator is optimal achieving the uniformly minimum quadratic loss asymptotically among all linear combinations of the identity matrix and the sample covariance matrix. Under certain regularity conditions, the proposed estimator of the model parameters is consistent. The superior performance of the proposed approach over existing methods is illustrated through simulation studies. Implemented to a resting-state fMRI study acquired from the Alzheimer's Disease Neuroimaging Initiative, the proposed approach identified a brain network within which functional connectivity is significantly associated with Apolipoprotein E ε4, a strong genetic marker for Alzheimer's disease.

10.
J Appl Stat ; 47(6): 1064-1083, 2020.
Article in English | MEDLINE | ID: mdl-35706920

ABSTRACT

Various gene network models with distinct physical nature have been widely used in biological studies. For temporal transcriptomic studies, the current dynamic models either ignore the temporal variation in the network structure or fail to scale up to a large number of genes due to severe computational bottlenecks and sample size limitation. Although the correlation-based gene networks are computationally affordable, they have limitations after being applied to gene expression time-course data. We proposed Temporal Gene Coexpression Network Analysis (TGCnA) framework for the transcriptomic time-course data. The mathematical nature of TGCnA is the joint modeling of multiple covariance matrices across time points using a 'low-rank plus sparse' framework, in which the network similarity across time points is explicitly modeled in the low-rank component. We demonstrated the advantage of TGCnA in covariance matrix estimation and gene module discovery using both simulation data and real transcriptomic data. The code is available at https://github.com/QiZhangStat/TGCnA.

11.
Spectrochim Acta A Mol Biomol Spectrosc ; 228: 117836, 2020 Mar 05.
Article in English | MEDLINE | ID: mdl-31771907

ABSTRACT

Octane number is an anti-knock index of fuel gasoline, which has an important impact on the service life of engine components and the safety of vehicles. Therefore, it is a basic work involving safety to predict the gasoline octane number accurately. This work was aimed to predict the octane number of near infrared (NIR) spectroscopy by combining dimension reduction algorithm with neural network. Covariance matrix estimation (CME), known as a mathematical statistic tool, was applied to estimating the intrinsic dimensions of octane spectrum dataset. Landmark-Isometric feature mapping (L-Isomap), as a novel manifold learning algorithm, was used for dimensionality reduction of spectral data. A new method, beetle antennae search optimization BP neural network (BAS-BP), was proposed to realize the prediction of octane number. In order to verify the performance of CME-L-Isomap-BAS-BP model presented in this paper, it is compared with other models. The results showed that when CME-L-Isomap was combined with BAS-BP, the average recovery rate (AR), mean square error (MSE), mean absolute percentage error (MAPE), correlation coefficient (R) and running time were superior than other models. The satisfying results demonstrated that the CME-L-Isomap-BAS-BP model is more suitable for prediction of gasoline octane number.

12.
Econometrica ; 83(4): 1497-1541, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-26778846

ABSTRACT

We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a "power enhancement component", which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models.

13.
SIAM J Imaging Sci ; 8(1): 126-185, 2015 Jan 22.
Article in English | MEDLINE | ID: mdl-25699132

ABSTRACT

In cryo-electron microscopy (cryo-EM), a microscope generates a top view of a sample of randomly oriented copies of a molecule. The problem of single particle reconstruction (SPR) from cryo-EM is to use the resulting set of noisy two-dimensional projection images taken at unknown directions to reconstruct the three-dimensional (3D) structure of the molecule. In some situations, the molecule under examination exhibits structural variability, which poses a fundamental challenge in SPR. The heterogeneity problem is the task of mapping the space of conformational states of a molecule. It has been previously suggested that the leading eigenvectors of the covariance matrix of the 3D molecules can be used to solve the heterogeneity problem. Estimating the covariance matrix is challenging, since only projections of the molecules are observed, but not the molecules themselves. In this paper, we formulate a general problem of covariance estimation from noisy projections of samples. This problem has intimate connections with matrix completion problems and high-dimensional principal component analysis. We propose an estimator and prove its consistency. When there are finitely many heterogeneity classes, the spectrum of the estimated covariance matrix reveals the number of classes. The estimator can be found as the solution to a certain linear system. In the cryo-EM case, the linear operator to be inverted, which we term the projection covariance transform, is an important object in covariance estimation for tomographic problems involving structural variation. Inverting it involves applying a filter akin to the ramp filter in tomography. We design a basis in which this linear operator is sparse and thus can be tractably inverted despite its large size. We demonstrate via numerical experiments on synthetic datasets the robustness of our algorithm to high levels of noise.

SELECTION OF CITATIONS
SEARCH DETAIL