Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
J Econom ; 232(2): 367-388, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36776480

RESUMEN

Quantile regression is a powerful tool for learning the relationship between a response variable and a multivariate predictor while exploring heterogeneous effects. This paper focuses on statistical inference for quantile regression in the "increasing dimension" regime. We provide a comprehensive analysis of a convolution smoothed approach that achieves adequate approximation to computation and inference for quantile regression. This method, which we refer to as conquer, turns the non-differentiable check function into a twice-differentiable, convex and locally strongly convex surrogate, which admits fast and scalable gradient-based algorithms to perform optimization, and multiplier bootstrap for statistical inference. Theoretically, we establish explicit non-asymptotic bounds on estimation and Bahadur-Kiefer linearization errors, from which we show that the asymptotic normality of the conquer estimator holds under a weaker requirement on dimensionality than needed for conventional quantile regression. The validity of multiplier bootstrap is also provided. Numerical studies confirm conquer as a practical and reliable approach to large-scale inference for quantile regression. Software implementing the methodology is available in the R package conquer.

2.
Ann Stat ; 46(3): 989-1017, 2018 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-29942099

RESUMEN

Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries by such data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions on exogeneity of covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given certain number of predictors, namely, the distribution of the correlation of a response variable Y with the best s linear combinations of p covariates X, even when X and Y are independent. When the covariance matrix of X possesses the restricted eigenvalue property, we derive such distributions for both finite s and diverging s, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of X. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where residuals are from regularized fits. Our approach is then applied to construct the upper confidence limit for the maximum spurious correlation and testing exogeneity of covariates. The former provides a baseline for guarding against false discoveries due to data mining and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated by both numerical examples and real data analysis.

3.
Ann Stat ; 46(5): 1904-1931, 2018 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-30220745

RESUMEN

Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled by a single grossly outlying observation. As argued in the seminal work of Peter Huber in 1973 [Ann. Statist.1 (1973) 799-821], robust alternatives to the method of least squares are sorely needed. To achieve robustness against heavy-tailed sampling distributions, we revisit the Huber estimator from a new perspective by letting the tuning parameter involved diverge with the sample size. In this paper, we develop nonasymptotic concentration results for such an adaptive Huber estimator, namely, the Huber estimator with the tuning parameter adapted to sample size, dimension, and the variance of the noise. Specifically, we obtain a sub-Gaussian-type deviation inequality and a nonasymptotic Bahadur representation when noise variables only have finite second moments. The nonasymptotic results further yield two conventional normal approximation results that are of independent interest, the Berry-Esseen inequality and Cramér-type moderate deviation. As an important application to large-scale simultaneous inference, we apply these robust normal approximation results to analyze a dependence-adjusted multiple testing procedure for moderately heavy-tailed data. It is shown that the robust dependence-adjusted procedure asymptotically controls the overall false discovery proportion at the nominal level under mild moment conditions. Thorough numerical results on both simulated and real datasets are also provided to back up our theory.

4.
Biometrics ; 73(1): 31-41, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-27377648

RESUMEN

Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.


Asunto(s)
Análisis de Varianza , Análisis por Conglomerados , Genómica/métodos , Algoritmos , Asma/genética , Simulación por Computador , Interpretación Estadística de Datos , Perfilación de la Expresión Génica , Genómica/estadística & datos numéricos , Humanos , Tamaño de la Muestra
5.
Biometrics ; 73(4): 1300-1310, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-28369742

RESUMEN

In this article, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic data sets and an human acute lymphoblastic leukemia gene expression data set, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.


Asunto(s)
Simulación por Computador , Estudios de Asociación Genética , Interpretación Estadística de Datos , Expresión Génica , Estudios de Asociación Genética/estadística & datos numéricos , Humanos , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética
6.
Biomed Environ Sci ; 37(6): 617-627, 2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38988112

RESUMEN

Objective: The aim of this study was to explore the role and mechanism of ferroptosis in SiO 2-induced cardiac injury using a mouse model. Methods: Male C57BL/6 mice were intratracheally instilled with SiO 2 to create a silicosis model. Ferrostatin-1 (Fer-1) and deferoxamine (DFO) were used to suppress ferroptosis. Serum biomarkers, oxidative stress markers, histopathology, iron content, and the expression of ferroptosis-related proteins were assessed. Results: SiO 2 altered serum cardiac injury biomarkers, oxidative stress, iron accumulation, and ferroptosis markers in myocardial tissue. Fer-1 and DFO reduced lipid peroxidation and iron overload, and alleviated SiO 2-induced mitochondrial damage and myocardial injury. SiO 2 inhibited Nuclear factor erythroid 2-related factor 2 (Nrf2) and its downstream antioxidant genes, while Fer-1 more potently reactivated Nrf2 compared to DFO. Conclusion: Iron overload-induced ferroptosis contributes to SiO 2-induced cardiac injury. Targeting ferroptosis by reducing iron accumulation or inhibiting lipid peroxidation protects against SiO 2 cardiotoxicity, potentially via modulation of the Nrf2 pathway.


Asunto(s)
Modelos Animales de Enfermedad , Ferroptosis , Sobrecarga de Hierro , Ratones Endogámicos C57BL , Miocitos Cardíacos , Dióxido de Silicio , Silicosis , Animales , Ferroptosis/efectos de los fármacos , Masculino , Ratones , Sobrecarga de Hierro/metabolismo , Dióxido de Silicio/toxicidad , Silicosis/metabolismo , Silicosis/tratamiento farmacológico , Silicosis/patología , Miocitos Cardíacos/efectos de los fármacos , Miocitos Cardíacos/metabolismo , Deferoxamina/farmacología , Fenilendiaminas/farmacología , Factor 2 Relacionado con NF-E2/metabolismo , Factor 2 Relacionado con NF-E2/genética , Estrés Oxidativo/efectos de los fármacos , Hierro/metabolismo , Ciclohexilaminas/farmacología
7.
Huan Jing Ke Xue ; 44(11): 5986-5996, 2023 Nov 08.
Artículo en Zh | MEDLINE | ID: mdl-37973083

RESUMEN

The characteristics and main factors of causes of haze in Zhoukou in January 2022 were analyzed. Six air pollutants, water-soluble ions, elements, OC, EC, and other parameters in fine particulate matter were monitored and analyzed using a set of online high-time-resolution instruments in an urban area. The results showed that the secondary inorganic aerosols(SNA), carbonaceous aerosols(CA, including organic carbon OC and inorganic carbon EC), and reconstructed crustal materials(CM, such as Al2O3, SiO2, CaO, and Fe2O3, etc.) were the three main components, accounting for 61.3%, 24.3%, and 9.72% in PM2.5, respectively. The concentrations of SNA, CA, CM, and SOA were increased, accompanied with higher AQI. The sulfur oxidation rate(SOR) and nitrogen oxidation rate(NOR) in January were 0.53 and 0.46, respectively. The growth rates[µg·(m3·h)] of sulfate and nitrate were 0.027(-5.89-9.47, range) and 0.051(-23.1-12.4), respectively. During the haze period, the growth rates of sulfate and nitrate were 0.13 µg·(m3·h)-1and 0.24 µg·(m3·h)-1, which were 4.8 and 4.7 times higher than the average value of January, respectively. Although the sulfur oxidation rate was greater than the nitrogen oxidation rate, the growth rate of nitrate was approximately 1.8 times that of sulfate owing to the difference in the concentration of gaseous precursors and the influence of relative humidity. The growth rates of nitrate in SNA were significantly higher than those of sulfate on heavily polluted days. The values of SOR, NOR, and concentrations of SNA and SOA during higher AQI and humidity periods were higher than those in lower AQI and humidity periods. The Ox(NO2+O3) decreased with the increase in relative humidity. The SOA was higher at nighttime, increasing faster with the humidity than that in daytime. Under the situation of lower temperature, higher humidity, and lower wind speed, the emission of gaseous precursors of SNA requires further attention in Zhoukou in winter. Advanced control strategies of emissions of SO2 and NO2, such as mobile sources and coal-burning sources, could reduce the peak of haze in winter efficiently.

8.
J Am Stat Assoc ; 115(529): 254-265, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33139964

RESUMEN

Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded (1 + δ)-th moment for any δ > 0. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when δ ≥ 1, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime 0 < δ < 1 and the transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive.

9.
Zhongguo Dang Dai Er Ke Za Zhi ; 11(12): 976-9, 2009 Dec.
Artículo en Zh | MEDLINE | ID: mdl-20113602

RESUMEN

OBJECTIVE: High noise levels (>70 dB) in the neonatal intensive care unit (NICU) are common in some primary hospitals. This study aimed to investigate the noise in the NICU on auditory system and intelligence development in premature infants. METHODS: One hundred premature infants with respiratory distress syndrome who needed mechanical ventilation therapy were randomly divided into observation and control groups according to the use of earmuffs. The duration of mechanical ventilation therapy lasted for 2 to 15 days in the two groups. After weaning from mechanical ventilator, the auditory brainstem response, cranial B-ultrasonography, and the intelligence development assessment were performed. RESULTS: The percentage of total (23% vs 47%) and mild hearing loss (15% vs 35%) in the observation group was significantly lower than that in the control group (p<0.05) 2 to 3 days after weaning from mechanical ventilator. The incidence of periventricular hemorrhage intraventricular hemorrhage (PVH-IVH) or periventricular leukomalacia (PVL) in the observation group was significantly lower than that in the control group (21% vs 42%; p<0.05). The intelligence development assessment performed in the first 6 and 12 months of life showed that the mental development index and the psychomotor development index in the observation group were much higher than those in the control group (p<0.05). CONCLUSIONS: The noise in the NICU can result in mild hearing loss and retardation of intelligence development and increase the incidence of PVH-IVH and PVL in premature infants. The use of earmuff may reduce the adverse events.


Asunto(s)
Desarrollo Infantil , Audición , Inteligencia , Ruido/efectos adversos , Hemorragia Cerebral/epidemiología , Femenino , Humanos , Recién Nacido , Recien Nacido Prematuro , Unidades de Cuidado Intensivo Neonatal , Leucomalacia Periventricular/epidemiología , Masculino
10.
J Am Stat Assoc ; 114(528): 1880-1893, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-33033420

RESUMEN

Large-scale multiple testing with correlated and heavy-tailed data arises in a wide range of research areas from genomics, medical imaging to finance. Conventional methods for estimating the false discovery proportion (FDP) often ignore the effect of heavy-tailedness and the dependence structure among test statistics, and thus may lead to inefficient or even inconsistent estimation. Also, the commonly imposed joint normality assumption is arguably too stringent for many applications. To address these challenges, in this paper we propose a Factor-Adjusted Robust Multiple Testing (FarmTest) procedure for large-scale simultaneous inference with control of the false discovery proportion. We demonstrate that robust factor adjustments are extremely important in both controlling the FDP and improving the power. We identify general conditions under which the proposed method produces consistent estimate of the FDP. As a byproduct that is of independent interest, we establish an exponential-type deviation inequality for a robust U-type covariance estimator under the spectral norm. Extensive numerical experiments demonstrate the advantage of the proposed method over several state-of-the-art methods especially when the data are generated from heavy-tailed distributions. The proposed procedures are implemented in the R-package FarmTest.

11.
ACS Appl Mater Interfaces ; 9(2): 1553-1561, 2017 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-27997793

RESUMEN

The development of a three-dimensionally flexible, large-surface area, high-conductivity electrode is important to improve the low conductivity and utilization of active materials and restrict the shuttle of long-chain polysulfides in Li-polysulfide batteries. Herein, we constructed an integrated three-dimensional carbon nanotube forest/carbon cloth electrode with heteroatom doping and high electrical conductivity. The as-constructed electrode provides strong trapping on the polysulfide species and fast charge transfer. Therefore, the Li-polysulfide batteries with as-constructed electrodes achieved high specific capacities of ∼1200 and ∼800 mA h g-1 at 0.1 and 1 C, respectively. After 300 cycles at 0.5 C, a specific capacity of 623 mA h g-1 was retained.

12.
Artículo en Inglés | MEDLINE | ID: mdl-28936128

RESUMEN

Many data-mining and statistical machine learning algorithms have been developed to select a subset of covariates to associate with a response variable. Spurious discoveries can easily arise in high-dimensional data analysis due to enormous possibilities of such selections. How can we know statistically our discoveries better than those by chance? In this paper, we define a measure of goodness of spurious fit, which shows how good a response variable can be fitted by an optimally selected subset of covariates under the null model, and propose a simple and effective LAMM algorithm to compute it. It coincides with the maximum spurious correlation for linear models and can be regarded as a generalized maximum spurious correlation. We derive the asymptotic distribution of such goodness of spurious fit for generalized linear models and L1-regression. Such an asymptotic distribution depends on the sample size, ambient dimension, the number of variables used in the fit, and the covariance information. It can be consistently estimated by multiplier bootstrapping and used as a benchmark to guard against spurious discoveries. It can also be applied to model selection, which considers only candidate models with goodness of fits better than those by spurious fits. The theory and method are convincingly illustrated by simulated examples and an application to the binary outcomes from German Neuroblastoma Trials.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA