A polynomial algorithm for best-subset selection problem.

Zhu, Junxian; Wen, Canhong; Zhu, Jin; Zhang, Heping; Wang, Xueqin

Zhu, Junxian; Wen, Canhong; Zhu, Jin; Zhang, Heping; Wang, Xueqin.

Afiliação

Zhu J; School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong 510275, China.
Wen C; School of Management, University of Science and Technology of China, Hefei, Anhui 230026, China.
Zhu J; School of Mathematics, Sun Yat-sen University, Guangzhou, Guangdong 510275, China.
Zhang H; Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06525 heping.zhang@yale.edu wangxq20@ustc.edu.cn.
Wang X; School of Management, University of Science and Technology of China, Hefei, Anhui 230026, China; heping.zhang@yale.edu wangxq20@ustc.edu.cn.

Proc Natl Acad Sci U S A ; 117(52): 33117-33123, 2020 12 29.

Article em En | MEDLINE | ID: mdl-33328272

RESUMO

Best-subset selection aims to find a small subset of predictors, so that the resulting linear model is expected to have the most desirable prediction accuracy. It is not only important and imperative in regression analysis but also has far-reaching applications in every facet of research, including computer science and medicine. We introduce a polynomial algorithm, which, under mild conditions, solves the problem. This algorithm exploits the idea of sequencing and splicing to reach a stable solution in finite steps when the sparsity level of the model is fixed but unknown. We define an information criterion that helps the algorithm select the true sparsity level with a high probability. We show that when the algorithm produces a stable optimal solution, that solution is the oracle estimator of the true parameters with probability one. We also demonstrate the power of the algorithm in several numerical studies.

Assuntos

Aprendizado de Máquina; Modelos Estatísticos

Palavras-chave

best-subset selection; high dimensional; splicing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Proc Natl Acad Sci U S A Ano de publicação: 2020 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google