Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 120
Filtrar
1.
J Am Stat Assoc ; 118(542): 1000-1010, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37347088

RESUMEN

When the data are stored in a distributed manner, direct applications of traditional statistical inference procedures are often prohibitive due to communication costs and privacy concerns. This paper develops and investigates two Communication-Efficient Accurate Statistical Estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcasts aggregated information to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is presented explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multi-step statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which the one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.

2.
J Am Stat Assoc ; 118(542): 858-868, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37313368

RESUMEN

We investigate the effectiveness of convex relaxation and nonconvex optimization in solving bilinear systems of equations under two different designs (i.e. a sort of random Fourier design and Gaussian design). Despite the wide applicability, the theoretical understanding about these two paradigms remains largely inadequate in the presence of random noise. The current paper makes two contributions by demonstrating that: (1) a two-stage nonconvex algorithm attains minimax-optimal accuracy within a logarithmic number of iterations, and (2) convex relaxation also achieves minimax-optimal statistical accuracy vis-à-vis random noise. Both results significantly improve upon the state-of-the-art theoretical guarantees.

3.
J Am Stat Assoc ; 118(544): 2315-2328, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38550788

RESUMEN

In this paper, we leverage over-parameterization to design regularization-free algorithms for the high-dimensional single index model and provide theoretical guarantees for the induced implicit regularization phenomenon. Specifically, we study both vector and matrix single index models where the link function is nonlinear and unknown, the signal parameter is either a sparse vector or a low-rank symmetric matrix, and the response variable can be heavy-tailed. To gain a better understanding of the role played by implicit regularization without excess technicality, we assume that the distribution of the covariates is known a priori. For both the vector and matrix settings, we construct an over-parameterized least-squares loss function by employing the score function transform and a robust truncation step designed specifically for heavy-tailed data. We propose to estimate the true parameter by applying regularization-free gradient descent to the loss function. When the initialization is close to the origin and the stepsize is sufficiently small, we prove that the obtained solution achieves minimax optimal statistical rates of convergence in both the vector and matrix cases. In addition, our experimental results support our theoretical findings and also demonstrate that our methods empirically outperform classical methods with explicit regularization in terms of both ℓ2-statistical rate and variable selection consistency.

4.
Ann Stat ; 50(1): 460-486, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36148472

RESUMEN

We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample R 2. The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.

5.
J Am Stat Assoc ; 117(538): 996-1009, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36060554

RESUMEN

Characterizing the asymptotic distributions of eigenvectors for large random matrices poses important challenges yet can provide useful insights into a range of statistical applications. To this end, in this paper we introduce a general framework of asymptotic theory of eigenvectors (ATE) for large spiked random matrices with diverging spikes and heterogeneous variances, and establish the asymptotic properties of the spiked eigenvectors and eigenvalues for the scenario of the generalized Wigner matrix noise. Under some mild regularity conditions, we provide the asymptotic expansions for the spiked eigenvalues and show that they are asymptotically normal after some normalization. For the spiked eigenvectors, we establish asymptotic expansions for the general linear combination and further show that it is asymptotically normal after some normalization, where the weight vector can be arbitrary. We also provide a more general asymptotic theory for the spiked eigenvectors using the bilinear form. Simulation studies verify the validity of our new theoretical results. Our family of models encompasses many popularly used ones such as the stochastic block models with or without overlapping communities for network analysis and the topic models for text analysis, and our general theory can be exploited for statistical inference in these large-scale applications.

6.
Ann Stat ; 50(2): 615-639, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35814863

RESUMEN

Understanding statistical inference under possibly non-sparse high-dimensional models has gained much interest recently. For a given component of the regression coefficient, we show that the difficulty of the problem depends on the sparsity of the corresponding row of the precision matrix of the covariates, not the sparsity of the regression coefficients. We develop new concepts of uniform and essentially uniform non-testability that allow the study of limitations of tests across a broad set of alternatives. Uniform non-testability identifies a collection of alternatives such that the power of any test, against any alternative in the group, is asymptotically at most equal to the nominal size. Implications of the new constructions include new minimax testability results that, in sharp contrast to the current results, do not depend on the sparsity of the regression parameters. We identify new tradeoffs between testability and feature correlation. In particular, we show that, in models with weak feature correlations, minimax lower bound can be attained by a test whose power has the n rate, regardless of the size of the model sparsity.

7.
J Econom ; 230(1): 3-19, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35754940

RESUMEN

Many sparse regression methods are based on the assumption that covariates are weakly correlated, which unfortunately do not hold in many economic and financial datasets. To address this challenge, we model the strongly-correlated covariates by a factor structure: strong correlations among covariates are explained by common factors and the remaining variations are interpreted as idiosyncratic components. We then propose a factor-adjusted sparse regression model with both common factors and idiosyncratic components as decorrelated covariates and develop a semi-Bayesian method. Parameter estimation rate-optimality and model selection consistency are established by non-asymptotic analyses. We show on simulated data that the semi-Bayesian method outperforms its Lasso analogue, manifests insensitivity to the overestimates of the number of common factors, pays a negligible price when covariates are not correlated, scales up well with increasing sample size, dimensionality and sparsity, and converges fast to the equilibrium of the posterior distribution. Numerical results on a real dataset of U.S. bond risk premia and macroeconomic indicators also lend strong supports to the proposed method.

8.
Stoch Process Their Appl ; 150: 802-818, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35756192

RESUMEN

High-dimensional linear regression has been intensively studied in the community of statistics in the last two decades. For the convenience of theoretical analyses, classical methods usually assume independent observations and sub-Gaussian-tailed errors. However, neither of them hold in many real high-dimensional time-series data. Recently [Sun, Zhou, Fan, 2019, J. Amer. Stat. Assoc., in press] proposed Adaptive Huber Regression (AHR) to address the issue of heavy-tailed errors. They discover that the robustification parameter of the Huber loss should adapt to the sample size, the dimensionality, and the moments of the heavy-tailed errors. We progress in a vertical direction and justify AHR on dependent observations. Specifically, we consider an important dependence structure - Markov dependence. Our results show that the Markov dependence impacts on the adaption of the robustification parameter and the estimation of regression coefficients in the way that the sample size should be discounted by a factor depending on the spectral gap of the underlying Markov chain.

9.
Natl Sci Rev ; 9(2): nwab183, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-35242339

RESUMEN

Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber-Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains.

10.
Ann Stat ; 49(3): 1239-1266, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34556893

RESUMEN

This paper introduces a simple principle for robust statistical inference via appropriate shrinkage on the data. This widens the scope of high-dimensional techniques, reducing the distributional conditions from sub-exponential or sub-Gaussian to more relaxed bounded second or fourth moment. As an illustration of this principle, we focus on robust estimation of the low-rank matrix Θ* from the trace regression model Y = Tr(Θ*⊤ X) + ϵ. It encompasses four popular problems: sparse linear model, compressed sensing, matrix completion and multi-task learning. We propose to apply the penalized least-squares approach to the appropriately truncated or shrunk data. Under only bounded 2+δ moment condition on the response, the proposed robust methodology yields an estimator that possesses the same statistical error rates as previous literature with sub-Gaussian errors. For sparse linear model and multi-task regression, we further allow the design to have only bounded fourth moment and obtain the same statistical rates. As a byproduct, we give a robust covariance estimator with concentration inequality and optimal rate of convergence in terms of the spectral norm, when the samples only bear bounded fourth moment. This result is of its own interest and importance. We reveal that under high dimensions, the sample covariance matrix is not optimal whereas our proposed robust covariance can achieve optimality. Extensive simulations are carried out to support the theories.

11.
J Mach Learn Res ; 222021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34566520

RESUMEN

This paper establishes Hoeffding's lemma and inequality for bounded functions of general-state-space and not necessarily reversible Markov chains. The sharpness of these results is characterized by the optimality of the ratio between variance proxies in the Markov-dependent and independent settings. The boundedness of functions is shown necessary for such results to hold in general. To showcase the usefulness of the new results, we apply them for non-asymptotic analyses of MCMC estimation, respondent-driven sampling and high-dimensional covariance matrix estimation on time series data with a Markovian nature. In addition to statistical problems, we also apply them to study the time-discounted rewards in econometric models and the multi-armed bandit problem with Markovian rewards arising from the field of machine learning.

12.
Stat Sci ; 36(2): 303-327, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-34321713

RESUMEN

Factor models are a class of powerful statistical models that have been widely used to deal with dependent measurements that arise frequently from various applications from genomics and neuroscience to economics and finance. As data are collected at an ever-growing scale, statistical machine learning faces some new challenges: high dimensionality, strong dependence among observed variables, heavy-tailed variables and heterogeneity. High-dimensional robust factor analysis serves as a powerful toolkit to conquer these challenges. This paper gives a selective overview on recent advance on high-dimensional factor models and their applications to statistics including Factor-Adjusted Robust Model selection (FarmSelect) and Factor-Adjusted Robust Multiple testing (FarmTest). We show that classical methods, especially principal component analysis (PCA), can be tailored to many new problems and provide powerful tools for statistical estimation and inference. We highlight PCA and its connections to matrix perturbation theory, robust statistics, random projection, false discovery rate, etc., and illustrate through several applications how insights from these fields yield solutions to modern challenges. We also present far-reaching connections between factor models and popular statistical learning problems, including network analysis and low-rank matrix recovery.

13.
Ann Stat ; 49(1): 435-458, 2021 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34305194

RESUMEN

This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix M ⋆ ∈ ℝ n × n , yet only a randomly perturbed version M is observed. The noise matrix M - M ⋆ is composed of independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise if, for example, we have two independent samples for each entry of M ⋆ and arrange them in an asymmetric fashion. The aim is to estimate the leading eigenvalue and the leading eigenvector of M ⋆. We demonstrate that the leading eigenvalue of the data matrix M can be O ( n ) times more accurate (up to some log factor) than its (unadjusted) leading singular value of M in eigenvalue estimation. Moreover, the eigen-decomposition approach is fully adaptive to heteroscedasticity of noise, without the need of any prior knowledge about the noise distributions. In a nutshell, this curious phenomenon arises since the statistical asymmetry automatically mitigates the bias of the eigenvalue approach, thus eliminating the need of careful bias correction. Additionally, we develop appealing non-asymptotic eigenvector perturbation bounds; in particular, we are able to bound the perterbation of any linear function of the leading eigenvector of M (e.g. entrywise eigenvector perturbation). We also provide partial theory for the more general rank-r case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be quite beneficial.

14.
Stat Sci ; 36(2): 264-290, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-34305305

RESUMEN

Deep learning has achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex dependency between input features and labels. While neural networks have a long history, recent advances have greatly improved their performance in computer vision, natural language processing, etc. From the statistical and scientific perspective, it is natural to ask: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? To answer these questions, we introduce common neural network models (e.g., convolutional neural nets, recurrent neural nets, generative adversarial nets) and training techniques (e.g., stochastic gradient descent, dropout, batch normalization) from a statistical point of view. Along the way, we highlight new characteristics of deep learning (including depth and over-parametrization) and explain their practical and theoretical benefits. We also sample recent results on theories of deep learning, many of which are only suggestive. While a complete understanding of deep learning remains elusive, we hope that our perspectives and discussions serve as a stimulus for new statistical research.

15.
ArXiv ; 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33442559

RESUMEN

With the severity of the COVID-19 outbreak, we characterize the nature of the growth trajectories of counties in the United States using a novel combination of spectral clustering and the correlation matrix. As the U.S. and the rest of the world are experiencing a severe second wave of infections, the importance of assigning growth membership to counties and understanding the determinants of the growth are increasingly evident. Subsequently, we select the demographic features that are most statistically significant in distinguishing the communities. Lastly, we effectively predict the future growth of a given county with an LSTM using three social distancing scores. This comprehensive study captures the nature of counties' growth in cases at a very micro-level using growth communities, demographic factors, and social distancing performance to help government agencies utilize known information to make appropriate decisions regarding which potential counties to target resources and funding to.

16.
Ann Stat ; 49(5): 2948-2971, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36148268

RESUMEN

This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-à-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the ℓ ∞ loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.

17.
Adv Neural Inf Process Syst ; 34: 16671-16685, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36168331

RESUMEN

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space S and the action space A are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with | S | × | A | , which can be prohibitively large when S or A is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp. Q-learning) provably learns an ε-optimal policy (resp. Q-function) with high probability as soon as the sample size exceeds the order of K ( 1 - γ ) 3 ε 2 ( resp . K ( 1 - γ ) 4 ε 2 ) , up to some logarithmic factor. Here K is the feature dimension and γ ∈ (0, 1) is the discount factor of the MDP. Both sample complexity bounds are provably tight, and our result for the model-based approach matches the minimax lower bound. Our results show that for arbitrarily large-scale MDP, both the model-based approach and Q-learning are sample-efficient when K is relatively small, and hence the title of this paper.

18.
Stat Sin ; 31(1): 333-360, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-35046630

RESUMEN

Current workflows for colocalization analysis in fluorescence microscopic imaging introduce significant bias in terms of the user's choice of region of interest (ROI). In this work, we introduce an automatic, unbiased structured detection method for correlated region detection between two random processes observed on a common domain. We argue that although intuitive, using the maximum log-likelihood statistic directly suffers from potential bias and substantially reduced power. Therefore, we introduce a simple size-based normalization to overcome this problem. We show that scanning using the proposed statistic leads to optimal correlated region detection over a large collection of structured correlation detection problems.

19.
J Econom ; 218(1): 119-139, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33208987

RESUMEN

Measuring conditional dependence is an important topic in econometrics with broad applications including graphical models. Under a factor model setting, a new conditional dependence measure based on projection is proposed. The corresponding conditional independence test is developed with the asymptotic null distribution unveiled where the number of factors could be high-dimensional. It is also shown that the new test has control over the asymptotic type I error and can be calculated efficiently. A generic method for building dependency graphs without Gaussian assumption using the new test is elaborated. We show the superiority of the new method, implemented in the R package pgraph, through simulation and real data studies.

20.
J Am Stat Assoc ; 115(529): 254-265, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33139964

RESUMEN

Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded (1 + δ)-th moment for any δ > 0. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when δ ≥ 1, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime 0 < δ < 1 and the transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...