Pesquisa | BVS Integralidade em Saúde

Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets.

Chaudhry, Muhammad Umar; Yasir, Muhammad; Asghar, Muhammad Nabeel; Lee, Jee-Hyong.

Entropy (Basel) ; 22(10)2020 Sep 29.

Artigo em Inglês | MEDLINE | ID: mdl-33286862

RESUMO

The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is as yet an unsolved puzzle. Recently, Monte Carlo Tree Search (MCTS)-based techniques have been invented that have attained great success in feature selection by constructing a binary feature selection tree and efficiently focusing on the most valuable features in the features space. However, one challenging problem associated with such approaches is a tradeoff between the tree search and the number of simulations. In a limited number of simulations, the tree might not meet the sufficient depth, thus inducing biasness towards randomness in feature subset selection. In this paper, a new algorithm for feature selection is proposed where multiple feature selection trees are built iteratively in a recursive fashion. The state space of every successor feature selection tree is less than its predecessor, thus increasing the impact of tree search in selecting best features, keeping the MCTS simulations fixed. In this study, experiments are performed on 16 benchmark datasets for validation purposes. We also compare the performance with state-of-the-art methods in literature both in terms of classification accuracy and the feature selection ratio.

MOTiFS: Monte Carlo Tree Search Based Feature Selection.

Chaudhry, Muhammad Umar; Lee, Jee-Hyong.

Entropy (Basel) ; 20(5)2018 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-33265475

RESUMO

Given the increasing size and complexity of datasets needed to train machine learning algorithms, it is necessary to reduce the number of features required to achieve high classification accuracy. This paper presents a novel and efficient approach based on the Monte Carlo Tree Search (MCTS) to find the optimal feature subset through the feature space. The algorithm searches for the best feature subset by combining the benefits of tree search with random sampling. Starting from an empty node, the tree is incrementally built by adding nodes representing the inclusion or exclusion of the features in the feature space. Every iteration leads to a feature subset following the tree and default policies. The accuracy of the classifier on the feature subset is used as the reward and propagated backwards to update the tree. Finally, the subset with the highest reward is chosen as the best feature subset. The efficiency and effectiveness of the proposed method is validated by experimenting on many benchmark datasets. The results are also compared with significant methods in the literature, which demonstrates the superiority of the proposed method.

Information Density Enhancement Using Lossy Compression in DNA Data Storage.

Seo, Seongjun; Tandon, Anshula; Lee, Keun Woo; Lee, Jee-Hyong; Park, Sung Ha.

Adv Mater ; : e2403071, 2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38779945

RESUMO

This study develops two deoxyribonucleic acid (DNA) lossy compression models, Models A and B, to encode grayscale images into DNA sequences, enhance information density, and enable high-fidelity image recovery. These models, distinguished by their handling of pixel domains and interpolation methods, offer a novel approach to data storage for DNA. Model A processes pixels in overlapped domains using linear interpolation (LI), whereas Model B uses non-overlapped domains with nearest-neighbor interpolation (NNI). Through a comparative analysis with Joint Photographic Experts Group (JPEG) compression, the DNA lossy compression models demonstrate competitive advantages in terms of information density and image quality restoration. The application of these models to the Modified National Institute of Standards and Technology (MNIST) dataset reveals their efficiency and the recognizability of decompressed images, which is validated by convolutional neural network (CNN) performance. In particular, Model B2, a version of Model B, emerges as an effective method for balancing high information density (surpassing over 20 times the typical densities of two bits per nucleotide) with reasonably good image quality. These findings highlight the potential of DNA-based data storage systems for high-density and efficient compression, indicating a promising future for biological data storage solutions.

Machine Learning for Detection of Safety Signals From Spontaneous Reporting System Data: Example of Nivolumab and Docetaxel.

Bae, Ji-Hwan; Baek, Yeon-Hee; Lee, Jeong-Eun; Song, Inmyung; Lee, Jee-Hyong; Shin, Ju-Young.

Front Pharmacol ; 11: 602365, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33628176

RESUMO

Introduction: Various methods have been implemented to detect adverse drug reaction (ADR) signals. However, the applicability of machine learning methods has not yet been fully evaluated. Objective: To evaluate the feasibility of machine learning algorithms in detecting ADR signals of nivolumab and docetaxel, new and old anticancer agents. Methods: We conducted a safety surveillance study of nivolumab and docetaxel using the Korea national spontaneous reporting database from 2009 to 2018. We constructed a novel input dataset for each study drug comprised of known ADRs that were listed in the drug labels and unknown ADRs. Given the known ADRs, we trained machine learning algorithms and evaluated predictive performance in generating safety signals of machine learning algorithms (gradient boosting machine [GBM] and random forest [RF]) compared with traditional disproportionality analysis methods (reporting odds ratio [ROR] and information component [IC]) by using the area under the curve (AUC). Each method then was implemented to detect new safety signals from the unknown ADR datasets. Results: Of all methods implemented, GBM achieved the best average predictive performance (AUC: 0.97 and 0.93 for nivolumab and docetaxel). The AUC achieved by each method was 0.95 and 0.92 (RF), 0.55 and 0.51 (ROR), and 0.49 and 0.48 (IC) for respective drug. GBM detected additional 24 and nine signals for nivolumab and 82 and 76 for docetaxel compared to ROR and IC, respectively, from the unknown ADR datasets. Conclusion: Machine learning algorithm based on GBM performed better and detected more new ADR signals than traditional disproportionality analysis methods.

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa