Your browser doesn't support javascript.
loading
Investigating the role of Simpson's paradox in the analysis of top-ranked features in high-dimensional bioinformatics datasets.
Freitas, Alex A.
Afiliação
  • Freitas AA; University of Kent, Kent, UK.
Brief Bioinform ; 21(2): 421-428, 2020 03 23.
Article em En | MEDLINE | ID: mdl-30629111
An important problem in bioinformatics consists of identifying the most important features (or predictors), among a large number of features in a given classification dataset. This problem is often addressed by using a machine learning-based feature ranking method to identify a small set of top-ranked predictors (i.e. the most relevant features for classification). The large number of studies in this area has, however, an important limitation: they ignore the possibility that the top-ranked predictors occur in an instance of Simpson's paradox, where the positive or negative association between a predictor and a class variable reverses sign upon conditional on each of the values of a third (confounder) variable. In this work, we review and investigate the role of Simpson's paradox in the analysis of top-ranked predictors in high-dimensional bioinformatics datasets, in order to avoid the potential danger of misinterpreting an association between a predictor and the class variable. We perform computational experiments using four well-known feature ranking methods from the machine learning field and five high-dimensional datasets of ageing-related genes, where the predictors are Gene Ontology terms. The results show that occurrences of Simpson's paradox involving top-ranked predictors are much more common for one of the feature ranking methods.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Biologia Computacional / Conjuntos de Dados como Assunto / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Biologia Computacional / Conjuntos de Dados como Assunto / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article
...