Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Cybern ; 52(5): 2942-2954, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-33027013

RESUMO

Feature selection is one of the most frequent tasks in data mining applications. Its ability to remove useless and redundant features improves the classification performance and gains knowledge about a given problem makes feature selection a common first step in data mining. In many feature selection applications, we need to combine the results of different feature selection processes. The two most common scenarios are the ensembles of feature selectors and the scaling up of feature selection methods using a data division approach. The standard procedure is to store the number of times every feature has been selected as a vote for the feature and then evaluate different selection thresholds with a certain criterion to obtain the final subset of selected features. However, this method is suboptimal as the relationships of the features are not considered in the voting process. Two redundant features may be selected a similar number of times due to the different sets of instances used each time. Thus, a voting scheme would tend to select both of them. In this article, we present a new approach: instead of using only the number of times a feature has been selected, the approach considers how many times the features have been selected together by a feature selection algorithm. The proposal is based on constructing an undirected graph where the vertices are the features, and the edges count the number of times every pair of instances has been selected together. This graph is used to select the best subset of features, avoiding the redundancy introduced by the voting scheme. The proposal improves the results of the standard voting scheme in both ensembles of feature selectors and data division methods for scaling up feature selection.


Assuntos
Algoritmos , Mineração de Dados , Projetos de Pesquisa
2.
Neural Netw ; 118: 175-191, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31299623

RESUMO

Prototype selection is one of the most common preprocessing tasks in data mining applications. The vast amounts of data that we must handle in practical problems render the removal of noisy, redundant or useless instances a convenient first step for any real-world application. Many algorithms have been proposed for prototype selection. For difficult problems, however, the use of only a single method would unlikely achieve the desired performance. Similar to the problem of classification, ensembles of prototype selectors have been proposed to overcome the limitations of single algorithms. In ensembles of prototype selectors, the usual combination method is based on a voting scheme coupled with an acceptance threshold. However, this method is suboptimal, because the relationships among the prototypes are not taken into account. In this paper, we propose a different approach, in which we consider not only the number of times every prototype has been selected but also the subsets of prototypes that are selected. With this additional information we develop GEEBIES, which is a new way of combining the results of ensembles of prototype selectors. In a large set of problems, we show that our proposal outperforms the standard boosting approach. A way of scaling up our method to large datasets is also proposed and experimentally tested.


Assuntos
Algoritmos , Bases de Dados Factuais , Estudo de Prova de Conceito , Mineração de Dados/normas , Bases de Dados Factuais/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA