Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control.
PLoS One
; 14(5): e0217463, 2019.
Article
en En
| MEDLINE
| ID: mdl-31150453
ABSTRACT
Feature screening has become a real prerequisite for the analysis of high-dimensional genomic data, as it is effective in reducing dimensionality and removing redundant features. However, existing methods for feature screening have been mostly relying on the assumptions of linear effects and independence (or weak dependence) between features, which might be inappropriate in real practice. In this paper, we consider the problem of selecting continuous features for a categorical outcome from high-dimensional data. We propose a powerful statistical procedure that consists of two steps, a nonparametric significance test based on edge count and a multiple testing procedure with dependence adjustment for false discovery rate control. The new method presents two novelties. First, the edge-count test directly targets distributional difference between groups, therefore it is sensitive to nonlinear effects. Second, we relax the independence assumption and adapt Efron's procedure to adjust for the dependence between features. The performance of the proposed procedure, in terms of statistical power and false discovery rate, is illustrated by simulated data. We apply the new method to three genomic datasets to identify genes associated with colon, cervical and prostate cancers.
Texto completo:
1
Bases de datos:
MEDLINE
Asunto principal:
Genómica
/
Análisis de Datos
/
Neoplasias
Tipo de estudio:
Diagnostic_studies
/
Screening_studies
Límite:
Humans
Idioma:
En
Revista:
PLoS One
Asunto de la revista:
CIENCIA
/
MEDICINA
Año:
2019
Tipo del documento:
Article
País de afiliación:
Estados Unidos