Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Jun 12.
Artículo en Inglés | MEDLINE | ID: mdl-38915705

RESUMEN

Arterial thrombosis, which represents a critical complication of cardiovascular diseases, is a leading cause of death and disability worldwide with no effective bioassay for clinical prediction. As a symbolic feature of arterial thrombosis, severe stenosis in the blood vessel creates a high-shear, high-gradient flow environment that effectively facilitates platelet aggregation towards vessel occlusion even with platelet amplification loops inhibited. However, no approach is currently available to comprehensively characterize the size, composition and platelet activation status of thrombi forming under this biorheological condition. Here, we present a thrombus profiling assay that monitors the multi-dimensional attributes of thrombi forming in conditions mimicking the physiological scenario of arterial thrombosis. Using this platform, we demonstrate that different receptor-ligand interactions contribute distinctively to the composition and activation status of the thrombus. Our investigation into hypertensive and older individuals reveals intensified biomechanical thrombogenesis and multi-dimensional thrombus profile abnormalities, demonstrating a direct contribution of mechanobiology to arterial thrombosis and endorsing the diagnostic potential of the assay. Furthermore, we identify the hyperactivity of GPIbα-integrin αIIbß3 mechanosensing axis as a molecular mechanism that contributes to hypertension-associated arterial thrombosis. By studying the interactions between anti-thrombotic inhibitors and hypertension, and the inter-individual variability in personal thrombus profiles, our work reveals a critical need for personalized anti-thrombotic drug selection that accommodates each patient's pathological profile.

2.
J Am Stat Assoc ; 119(545): 715-729, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38818252

RESUMEN

It is important to develop statistical techniques to analyze high-dimensional data in the presence of both complex dependence and possible heavy tails and outliers in real-world applications such as imaging data analyses. We propose a new robust high-dimensional regression with coefficient thresholding, in which an efficient nonconvex estimation procedure is proposed through a thresholding function and the robust Huber loss. The proposed regularization method accounts for complex dependence structures in predictors and is robust against heavy tails and outliers in outcomes. Theoretically, we rigorously analyze the landscape of the population and empirical risk functions for the proposed method. The fine landscape enables us to establish both statistical consistency and computational convergence under the high-dimensional setting. We also present an extension to incorporate spatial information into the proposed method. Finite-sample properties of the proposed methods are examined by extensive simulation studies. An application concerns a scalar-on-image regression analysis for an association of psychiatric disorder measured by the general factor of psychopathology with features extracted from the task functional MRI data in the Adolescent Brain Cognitive Development (ABCD) study.

3.
J Multivar Anal ; 2022024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38525479

RESUMEN

We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data, modeled as members of a metric space. Our key step is to build universal kernels (cc-universal) on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional independence that determines sufficient dimension reduction. For univariate distributions, we construct the universal kernel using the Wasserstein distance, while for multivariate distributions, we resort to the sliced Wasserstein distance. The sliced Wasserstein distance ensures that the metric space possesses similar topological properties to the Wasserstein space, while also offering significant computation benefits. Numerical results based on synthetic data show that our method outperforms possible competing methods. The method is also applied to several data sets, including fertility and mortality data and Calgary temperature data.

4.
J Bus Econ Stat ; 41(4): 1090-1100, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38125739

RESUMEN

Compositional data arises in a wide variety of research areas when some form of standardization and composition is necessary. Estimating covariance matrices is of fundamental importance for high-dimensional compositional data analysis. However, existing methods require the restrictive Gaussian or sub-Gaussian assumption, which may not hold in practice. We propose a robust composition adjusted thresholding covariance procedure based on Huber-type M-estimation to estimate the sparse covariance structure of high-dimensional compositional data. We introduce a cross-validation procedure to choose the tuning parameters of the proposed method. Theoretically, by assuming a bounded fourth moment condition, we obtain the rates of convergence and signal recovery property for the proposed method and provide the theoretical guarantees for the cross-validation procedure under the high-dimensional setting. Numerically, we demonstrate the effectiveness of the proposed method in simulation studies and also a real application to sales data analysis.

5.
Psychometrika ; 87(1): 83-106, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34191228

RESUMEN

Graphical models have received an increasing amount of attention in network psychometrics as a promising probabilistic approach to study the conditional relations among variables using graph theory. Despite recent advances, existing methods on graphical models usually assume a homogeneous population and focus on binary or continuous variables. However, ordinal variables are very popular in many areas of psychological science, and the population often consists of several different groups based on the heterogeneity in ordinal data. Driven by these needs, we introduce the finite mixture of ordinal graphical models to effectively study the heterogeneous conditional dependence relationships of ordinal data. We develop a penalized likelihood approach for model estimation, and design a generalized expectation-maximization (EM) algorithm to solve the significant computational challenges. We examine the performance of the proposed method and algorithm in simulation studies. Moreover, we demonstrate the potential usefulness of the proposed method in psychological science through a real application concerning the interests and attitudes related to fan avidity for students in a large public university in the United States.


Asunto(s)
Algoritmos , Simulación por Computador , Humanos , Funciones de Verosimilitud , Psicometría
6.
Biometrics ; 77(3): 984-995, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-32683674

RESUMEN

A critical task in microbiome data analysis is to explore the association between a scalar response of interest and a large number of microbial taxa that are summarized as compositional data at different taxonomic levels. Motivated by fine-mapping of the microbiome, we propose a two-step compositional knockoff filter to provide the effective finite-sample false discovery rate (FDR) control in high-dimensional linear log-contrast regression analysis of microbiome compositional data. In the first step, we propose a new compositional screening procedure to remove insignificant microbial taxa while retaining the essential sum-to-zero constraint. In the second step, we extend the knockoff filter to identify the significant microbial taxa in the sparse regression model for compositional data. Thereby, a subset of the microbes is selected from the high-dimensional microbial taxa as related to the response under a prespecified FDR threshold. We study the theoretical properties of the proposed two-step procedure, including both sure screening and effective false discovery control. We demonstrate these properties in numerical simulation studies to compare our methods to some existing ones and show power gain of the new method while controlling the nominal FDR. The potential usefulness of the proposed method is also illustrated with application to an inflammatory bowel disease data set to identify microbial taxa that influence host gene expressions.


Asunto(s)
Microbiota , Simulación por Computador , Análisis de Datos , Microbiota/genética , Análisis de Regresión , Proyectos de Investigación
7.
J Multivar Anal ; 1752020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-32863458

RESUMEN

Dynamic networks are a general language for describing time-evolving complex systems, and discrete time network models provide an emerging statistical technique for various applications. It is a fundamental research question to detect a set of nodes sharing similar connectivity patterns in time-evolving networks. Our work is primarily motivated by detecting groups based on interesting features of the time-evolving networks (e.g., stability). In this work, we propose a model-based clustering framework for time-evolving networks based on discrete time exponential-family random graph models, which simultaneously allows both modeling and detecting group structure. To choose the number of groups, we use the conditional likelihood to construct an effective model selection criterion. Furthermore, we propose an efficient variational expectation-maximization (EM) algorithm to find approximate maximum likelihood estimates of network parameters and mixing proportions. The power of our method is demonstrated in simulation studies and empirical applications to international trade networks and the collaboration networks of a large research university.

8.
Environ Sci Technol ; 54(14): 8632-8639, 2020 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-32603095

RESUMEN

Chemical spills in streams can impact ecosystem or human health. Typically, the public learns of spills from reports from industry, media, or government rather than monitoring data. For example, ∼1300 spills (76 ≥ 400 gallons or ∼1500 L) were reported from 2007 to 2014 by the regulator for natural gas wellpads in the Marcellus shale region of Pennsylvania (U.S.), a region of extensive drilling and hydraulic fracturing. Only one such incident of stream contamination in Pennsylvania has been documented with water quality data in peer-reviewed literature. This could indicate that spills (1) were small or contained on wellpads, (2) were diluted, biodegraded, or obscured by other contaminants, (3) were not detected because of sparse monitoring, or (4) were not detected because of the difficulties of inspecting data for complex stream networks. As a first step in addressing the last problem, we developed a geospatial-analysis tool, GeoNet, that analyzes stream networks to detect statistically significant changes between background and potentially impacted sites. GeoNet was used on data in the Water Quality Portal for the Pennsylvania Marcellus region. With the most stringent statistical tests, GeoNet detected 0.2% to 2% of the known contamination incidents (Na ± Cl) in streams. With denser sensor networks, tools like GeoNet could allow real-time detection of polluting events.


Asunto(s)
Gas Natural , Contaminantes Químicos del Agua , Ecosistema , Monitoreo del Ambiente , Humanos , Yacimiento de Petróleo y Gas , Pennsylvania , Ríos , Contaminantes Químicos del Agua/análisis
9.
Technometrics ; 62(2): 161-172, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33716325

RESUMEN

Water pollution is a major global environmental problem, and it poses a great environmental risk to public health and biological diversity. This work is motivated by assessing the potential environmental threat of coal mining through increased sulfate concentrations in river networks, which do not belong to any simple parametric distribution. However, existing network models mainly focus on binary or discrete networks and weighted networks with known parametric weight distributions. We propose a principled nonparametric weighted network model based on exponential-family random graph models and local likelihood estimation, and study its model-based clustering with application to large-scale water pollution network analysis. We do not require any parametric distribution assumption on network weights. The proposed method greatly extends the methodology and applicability of statistical network models. Furthermore, it is scalable to large and complex networks in large-scale environmental studies. The power of our proposed methods is demonstrated in simulation studies and a real application to sulfate pollution network analysis in Ohio watershed located in Pennsylvania, United States.

10.
Front Genet ; 10: 350, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31068967

RESUMEN

Differential abundance analysis is a crucial task in many microbiome studies, where the central goal is to identify microbiome taxa associated with certain biological or clinical conditions. There are two different modes of microbiome differential abundance analysis: the individual-based univariate differential abundance analysis and the group-based multivariate differential abundance analysis. The univariate analysis identifies differentially abundant microbiome taxa subject to multiple correction under certain statistical error measurements such as false discovery rate, which is typically complicated by the high-dimensionality of taxa and complex correlation structure among taxa. The multivariate analysis evaluates the overall shift in the abundance of microbiome composition between two conditions, which provides useful preliminary differential information for the necessity of follow-up validation studies. In this paper, we present a novel Adaptive multivariate two-sample test for Microbiome Differential Analysis (AMDA) to examine whether the composition of a taxa-set are different between two conditions. Our simulation studies and real data applications demonstrated that the AMDA test was often more powerful than several competing methods while preserving the correct type I error rate. A free implementation of our AMDA method in R software is available at https://github.com/xyz5074/AMDA.

11.
Nat Mater ; 18(7): 760-769, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30911119

RESUMEN

Integrins are membrane receptors that mediate cell adhesion and mechanosensing. The structure-function relationship of integrins remains incompletely understood, despite the extensive studies carried out because of its importance to basic cell biology and translational medicine. Using a fluorescence dual biomembrane force probe, microfluidics and cone-and-plate rheometry, we applied precisely controlled mechanical stimulations to platelets and identified an intermediate state of integrin αIIbß3 that is characterized by an ectodomain conformation, ligand affinity and bond lifetimes that are all intermediate between the well-known inactive and active states. This intermediate state is induced by ligand engagement of glycoprotein (GP) Ibα via a mechanosignalling pathway and potentiates the outside-in mechanosignalling of αIIbß3 for further transition to the active state during integrin mechanical affinity maturation. Our work reveals distinct αIIbß3 state transitions in response to biomechanical and biochemical stimuli, and identifies a role for the αIIbß3 intermediate state in promoting biomechanical platelet aggregation.


Asunto(s)
Fenómenos Mecánicos , Agregación Plaquetaria , Complejo GPIIb-IIIa de Glicoproteína Plaquetaria/metabolismo , Fenómenos Biomecánicos , Humanos , Ligandos , Transducción de Señal
12.
Environ Sci Process Impacts ; 21(2): 384-396, 2019 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-30608109

RESUMEN

With recent improvements in high-volume hydraulic fracturing (HVHF, known to the public as fracking), vast new reservoirs of natural gas and oil are now being tapped. As HVHF has expanded into the populous northeastern USA, some residents have become concerned about impacts on water quality. Scientists have addressed this concern by investigating individual case studies or by statistically assessing the rate of problems. In general, however, lack of access to new or historical water quality data hinders the latter assessments. We introduce a new statistical approach to assess water quality datasets - especially sets that differ in data volume and variance - and apply the technique to one region of intense shale gas development in northeastern Pennsylvania (PA) and one with fewer shale gas wells in northwestern PA. The new analysis for the intensely developed region corroborates an earlier analysis based on a different statistical test: in that area, changes in groundwater chemistry show no degradation despite that area's dense development of shale gas. In contrast, in the region with fewer shale gas wells, we observe slight but statistically significant increases in concentrations in some solutes in groundwaters. One potential explanation for the slight changes in groundwater chemistry in that area (northwestern PA) is that it is the regional focus of the earliest commercial development of conventional oil and gas (O&G) in the USA. Alternate explanations include the use of brines from conventional O&G wells as well as other salt mixtures on roads in that area for dust abatement or de-icing, respectively.


Asunto(s)
Agua Subterránea/química , Fracking Hidráulico , Gas Natural/análisis , Petróleo/análisis , Contaminantes Químicos del Agua/análisis , Agua/análisis , Yacimiento de Petróleo y Gas , Pennsylvania , Calidad del Agua
13.
PLoS Comput Biol ; 14(9): e1006436, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-30240439

RESUMEN

Co-expression network analysis provides useful information for studying gene regulation in biological processes. Examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. One challenge in this type of analysis is that the sample sizes in each condition are usually small, making the statistical inference of co-expression patterns highly underpowered. A joint network construction that borrows information from related structures across conditions has the potential to improve the power of the analysis. One possible approach to constructing the co-expression network is to use the Gaussian graphical model. Though several methods are available for joint estimation of multiple graphical models, they do not fully account for the heterogeneity between samples and between co-expression patterns introduced by condition specificity. Here we develop the condition-adaptive fused graphical lasso (CFGL), a data-driven approach to incorporate condition specificity in the estimation of co-expression networks. We show that this method improves the accuracy with which networks are learned. The application of this method on a rat multi-tissue dataset and The Cancer Genome Atlas (TCGA) breast cancer dataset provides interesting biological insights. In both analyses, we identify numerous modules enriched for Gene Ontology functions and observe that the modules that are upregulated in a particular condition are often involved in condition-specific activities. Interestingly, we observe that the genes strongly associated with survival time in the TCGA dataset are less likely to be network hubs, suggesting that genes associated with cancer progression are likely to govern specific functions or execute final biological functions in pathways, rather than regulating a large number of biological processes. Additionally, we observed that the tumor-specific hub genes tend to have few shared edges with normal tissue, revealing tumor-specific regulatory mechanism.


Asunto(s)
Encéfalo/metabolismo , Neoplasias de la Mama/metabolismo , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Miocardio/metabolismo , Algoritmos , Animales , Área Bajo la Curva , Neoplasias de la Mama/genética , Gráficos por Computador , Simulación por Computador , Bases de Datos Factuales , Femenino , Corazón , Humanos , Masculino , Neoplasias/metabolismo , Distribución Normal , Ratas , Programas Informáticos
14.
Genet Epidemiol ; 42(8): 772-782, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30218543

RESUMEN

Recent research has highlighted the importance of the human microbiome in many human disease and health conditions. Most current microbiome association analyses focus on unrelated samples; such methods are not appropriate for analysis of data collected from more advanced study designs such as longitudinal and pedigree studies, where outcomes can be correlated. Ignoring such correlations can sometimes lead to suboptimal results or even possibly biased conclusions. Thus, new methods to handle correlated outcome data in microbiome association studies are needed. In this paper, we propose the correlated sequence kernel association test (CSKAT) to address such correlations using the linear mixed model. Specifically, random effects are used to account for the outcome correlations and a variance component test is used to examine the microbiome effect. Compared to existing genetic association tests for longitudinal and family samples, we implement a correction procedure to better calibrate the null distribution of the score test statistic to accommodate the small sample size nature of data collected from a typical microbiome study. Comprehensive simulation studies are conducted to demonstrate the validity and efficiency of our method, and we show that CSKAT achieves a higher power than existing methods while correctly controlling the Type I error rate. We also apply our method to a microbiome data set collected from a UK twin study to illustrate its potential usefulness. A free implementation of our method in R software is available at https://github.com/jchen1981/SSKAT.


Asunto(s)
Algoritmos , Microbiota , Simulación por Computador , Humanos , Modelos Lineales , Microbiota/genética , Modelos Genéticos , Tamaño de la Muestra , Gemelos , Reino Unido
15.
Stat Surv ; 12: 105-135, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-31428219

RESUMEN

We present a selective review of statistical modeling of dynamic networks. We focus on models with latent variables, specifically, the latent space models and the latent class models (or stochastic blockmodels), which investigate both the observed features and the unobserved structure of networks. We begin with an overview of the static models, and then we introduce the dynamic extensions. For each dynamic model, we also discuss its applications that have been studied in the literature, with the data source listed in Appendix. Based on the review, we summarize a list of open problems and challenges in dynamic network modeling with latent variables.

16.
Environ Geochem Health ; 40(2): 865-885, 2018 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-29027593

RESUMEN

To understand how extraction of different energy sources impacts water resources requires assessment of how water chemistry has changed in comparison with the background values of pristine streams. With such understanding, we can develop better water quality standards and ecological interpretations. However, determination of pristine background chemistry is difficult in areas with heavy human impact. To learn to do this, we compiled a master dataset of sulfate and barium concentrations ([SO4], [Ba]) in Pennsylvania (PA, USA) streams from publically available sources. These elements were chosen because they can represent contamination related to oil/gas and coal, respectively. We applied changepoint analysis (i.e., likelihood ratio test) to identify pristine streams, which we defined as streams with a low variability in concentrations as measured over years. From these pristine streams, we estimated the baseline concentrations for major bedrock types in PA. Overall, we found that 48,471 data values are available for [SO4] from 1904 to 2014 and 3243 data for [Ba] from 1963 to 2014. Statewide [SO4] baseline was estimated to be 15.8 ± 9.6 mg/L, but values range from 12.4 to 26.7 mg/L for different bedrock types. The statewide [Ba] baseline is 27.7 ± 10.6 µg/L and values range from 25.8 to 38.7 µg/L. Results show that most increases in [SO4] from the baseline occurred in areas with intensive coal mining activities, confirming previous studies. Sulfate inputs from acid rain were also documented. Slight increases in [Ba] since 2007 and higher [Ba] in areas with higher densities of gas wells when compared to other areas could document impacts from shale gas development, the prevalence of basin brines, or decreases in acid rain and its coupled effects on [Ba] related to barite solubility. The largest impacts on PA stream [Ba] and [SO4] are related to releases from coal mining or burning rather than oil and gas development.


Asunto(s)
Lluvia Ácida , Bario/análisis , Minas de Carbón , Fracking Hidráulico , Gas Natural , Ríos , Sulfatos/análisis , Contaminantes Químicos del Agua/análisis , Región de los Apalaches , Conjuntos de Datos como Asunto , Geología , Actividades Humanas , Humanos , Pennsylvania , Factores de Tiempo
17.
J Econom ; 201(2): 292-306, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29731537

RESUMEN

We consider forecasting a single time series when there is a large number of predictors and a possible nonlinear effect. The dimensionality was first reduced via a high-dimensional (approximate) factor model implemented by the principal component analysis. Using the extracted factors, we develop a novel forecasting method called the sufficient forecasting, which provides a set of sufficient predictive indices, inferred from high-dimensional predictors, to deliver additional predictive power. The projected principal component analysis will be employed to enhance the accuracy of inferred factors when a semi-parametric (approximate) factor model is assumed. Our method is also applicable to cross-sectional sufficient regression using extracted factors. The connection between the sufficient forecasting and the deep learning architecture is explicitly stated. The sufficient forecasting correctly estimates projection indices of the underlying factors even in the presence of a nonparametric forecasting function. The proposed method extends the sufficient dimension reduction to high-dimensional regimes by condensing the cross-sectional information through factor models. We derive asymptotic properties for the estimate of the central subspace spanned by these projection directions as well as the estimates of the sufficient predictive indices. We further show that the natural method of running multiple regression of target on estimated factors yields a linear estimate that actually falls into this central subspace. Our method and theory allow the number of predictors to be larger than the number of observations. We finally demonstrate that the sufficient forecasting improves upon the linear forecasting in both simulation studies and an empirical study of forecasting macroeconomic variables.

18.
Elife ; 52016 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-27434669

RESUMEN

How cells sense their mechanical environment and transduce forces into biochemical signals is a crucial yet unresolved question in mechanobiology. Platelets use receptor glycoprotein Ib (GPIb), specifically its α subunit (GPIbα), to signal as they tether and translocate on von Willebrand factor (VWF) of injured arterial surfaces against blood flow. Force elicits catch bonds to slow VWF-GPIbα dissociation and unfolds the GPIbα leucine-rich repeat domain (LRRD) and juxtamembrane mechanosensitive domain (MSD). How these mechanical processes trigger biochemical signals remains unknown. Here we analyze these extracellular events and the resulting intracellular Ca(2+) on a single platelet in real time, revealing that LRRD unfolding intensifies Ca(2+) signal whereas MSD unfolding affects the type of Ca(2+) signal. Therefore, LRRD and MSD are analog and digital force transducers, respectively. The >30 nm macroglycopeptide separating the two domains transmits force on the VWF-GPIbα bond (whose lifetime is prolonged by LRRD unfolding) to the MSD to enhance its unfolding, resulting in unfolding cooperativity at an optimal force. These elements may provide design principles for a generic mechanosensory protein machine.


Asunto(s)
Plaquetas/fisiología , Calcio/metabolismo , Mecanorreceptores/metabolismo , Complejo GPIb-IX de Glicoproteína Plaquetaria/metabolismo , Factor de von Willebrand/metabolismo , Humanos , Unión Proteica , Pliegue de Proteína
19.
J Am Stat Assoc ; 111(516): 1726-1735, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-29097827

RESUMEN

We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.

20.
Ann Stat ; 42(3): 819-849, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25598560

RESUMEN

Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...