Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method.

Le, Thi-Thu-Huong; Kim, Haeyoung; Kang, Hyoeun; Kim, Howon

Le, Thi-Thu-Huong; Kim, Haeyoung; Kang, Hyoeun; Kim, Howon.

Affiliation

Le TT; IoT Research Center, Pusan National University, Busan 609735, Korea.
Kim H; Faculty of Information Technology, Hung Yen University of Technology and Education, Hung Yen 160000, Vietnam.
Kang H; School of Computer Science and Engineering, Pusan National University, Busan 609735, Korea.
Kim H; School of Computer Science and Engineering, Pusan National University, Busan 609735, Korea.

Sensors (Basel) ; 22(3)2022 Feb 03.

Article in En | MEDLINE | ID: mdl-35161899

ABSTRACT

In recent years, many methods for intrusion detection systems (IDS) have been designed and developed in the research community, which have achieved a perfect detection rate using IDS datasets. Deep neural networks (DNNs) are representative examples applied widely in IDS. However, DNN models are becoming increasingly complex in model architectures with high resource computing in hardware requirements. In addition, it is difficult for humans to obtain explanations behind the decisions made by these DNN models using large IoT-based IDS datasets. Many proposed IDS methods have not been applied in practical deployments, because of the lack of explanation given to cybersecurity experts, to support them in terms of optimizing their decisions according to the judgments of the IDS models. This paper aims to enhance the attack detection performance of IDS with big IoT-based IDS datasets as well as provide explanations of machine learning (ML) model predictions. The proposed ML-based IDS method is based on the ensemble trees approach, including decision tree (DT) and random forest (RF) classifiers which do not require high computing resources for training models. In addition, two big datasets are used for the experimental evaluation of the proposed method, NF-BoT-IoT-v2, and NF-ToN-IoT-v2 (new versions of the original BoT-IoT and ToN-IoT datasets), through the feature set of the net flow meter. In addition, the IoTDS20 dataset is used for experiments. Furthermore, the SHapley additive exPlanations (SHAP) is applied to the eXplainable AI (XAI) methodology to explain and interpret the classification decisions of DT and RF models; this is not only effective in interpreting the final decision of the ensemble tree approach but also supports cybersecurity experts in quickly optimizing and evaluating the correctness of their judgments based on the explanations of the results.

Subject(s)

Machine Learning; Neural Networks, Computer; Computer Security; Humans

Key words

SHapley Additive exPlanations (SHAP); decision tree; ensemble trees; explanation AI (XAI); intrusion detection systems (IDS); random forest

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Neural Networks, Computer / Machine Learning Type of study: Diagnostic_studies / Prognostic_studies Limits: Humans Language: En Journal: Sensors (Basel) Year: 2022 Document type: Article Country of publication: Switzerland

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google