Your browser doesn't support javascript.
loading
A fast supervised density-based discretization algorithm for classification tasks in the medical domain.
Aristodimou, Aristos; Diavastos, Andreas; Pattichis, Constantinos S.
Afiliação
  • Aristodimou A; Department of Computer Science, University of Cyprus, Nicosia, Cyprus.
  • Diavastos A; School of Computing, National University of Singapore, Singapore, Republic of Singapore.
  • Pattichis CS; Department of Computer Architecture, Universitat Politècnica de Catalunya, Barcelona, Catalunya.
Health Informatics J ; 28(1): 14604582211065397, 2022.
Article em En | MEDLINE | ID: mdl-35170333
Discretization is a preprocessing technique used for converting continuous features into categorical. This step is essential for processing algorithms that cannot handle continuous data as input. In addition, in the big data era, it is important for a discretizer to be able to efficiently discretize data. In this paper, a new supervised density-based discretization (DBAD) algorithm is proposed, which satisfies these requirements. For the evaluation of the algorithm, 11 datasets that cover a wide range of datasets in the medical domain were used. The proposed algorithm was tested against three state-of-the art discretizers using three classifiers with different characteristics. A parallel version of the algorithm was evaluated using two synthetic big datasets. In the majority of the performed tests, the algorithm was found performing statistically similar or better than the other three discretization algorithms it was compared to. Additionally, the algorithm was faster than the other discretizers in all of the performed tests. Finally, the parallel version of DBAD shows almost linear speedup for a Message Passing Interface (MPI) implementation (9.64× for 10 nodes), while a hybrid MPI/OpenMP implementation improves execution time by 35.3× for 10 nodes and 6 threads per node.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Biologia Computacional Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Biologia Computacional Idioma: En Ano de publicação: 2022 Tipo de documento: Article