Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Commun Stat Theory Methods ; 52(18): 6640-6657, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37840573

RESUMO

We propose a new measure of variable importance in high-dimensional regression based on the change in the LASSO solution path when one covariate is left out. The proposed procedure provides a novel way to calculate variable importance and conduct variable screening. In addition, our procedure allows for the construction of p-values for testing whether each coe cient is equal to zero as well as for testing hypotheses involving multiple regression coefficients simultaneously; bootstrap techniques are used to construct the null distribution. For low-dimensional linear models, our method can achieve higher power than the t-test. Extensive simulations are provided to show the effectiveness of our method. In the high-dimensional setting, our proposed solution path based test achieves greater power than some other recently developed high-dimensional inference methods. We extend our method to logistic regression and demonstrate in simulation that our leave-one-covariate-out solution path tests can provide accurate p-values.

2.
Biometrics ; 75(1): 13-23, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30267535

RESUMO

For disease screening, group (pooled) testing can be a cost-saving alternative to one-at-a-time testing, with savings realized through assaying pooled biospecimen (eg, urine, blood, saliva). In many group testing settings, practitioners are faced with the task of conducting disease surveillance. That is, it is often of interest to relate individuals' true disease statuses to covariate information via binary regression. Several authors have developed regression methods for group testing data, which is challenging due to the effects of imperfect testing. That is, all testing outcomes (on pools and individuals) are subject to misclassification, and individuals' true statuses are never observed. To further complicate matters, individuals may be involved in several testing outcomes. For analyzing such data, we provide a novel regression methodology which generalizes and extends the aforementioned regression techniques and which incorporates regularization. Specifically, for model fitting and variable selection, we propose an adaptive elastic net estimator under the logistic regression model which can be used to analyze data from any group testing strategy. We provide an efficient algorithm for computing the estimator along with guidance on tuning parameter selection. Moreover, we establish the asymptotic properties of the proposed estimator and show that it possesses "oracle" properties. We evaluate the performance of the estimator through Monte Carlo studies and illustrate the methodology on a chlamydia data set from the State Hygienic Laboratory in Iowa City.


Assuntos
Interpretação Estatística de Dados , Programas de Rastreamento/métodos , Algoritmos , Infecções por Chlamydia/diagnóstico , Simulação por Computador , Humanos , Programas de Rastreamento/economia , Programas de Rastreamento/estatística & dados numéricos , Método de Monte Carlo , Análise de Regressão
3.
J Am Stat Assoc ; 110(510): 837-849, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-26279594

RESUMO

We develop a test statistic for testing the equality of two population mean vectors in the "large-p-small-n" setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling T2 test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme data set from The Cancer Genome Atlas are carried out to illustrate the test.

4.
Artigo em Inglês | MEDLINE | ID: mdl-26146492

RESUMO

Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to a glioblastoma multiforme data set from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features.


Assuntos
Genômica/métodos , Modelos Estatísticos , Algoritmos , Teorema de Bayes , Simulação por Computador , Análise dos Mínimos Quadrados , Análise de Componente Principal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA