Evaluation of Federated Learning in Phishing Email Detection.

Thapa, Chandra; Tang, Jun Wen; Abuadbba, Alsharif; Gao, Yansong; Camtepe, Seyit; Nepal, Surya; Almashor, Mahathir; Zheng, Yifeng

Thapa, Chandra; Tang, Jun Wen; Abuadbba, Alsharif; Gao, Yansong; Camtepe, Seyit; Nepal, Surya; Almashor, Mahathir; Zheng, Yifeng.

Afiliação

Thapa C; Commonwealth Scientific and Industrial Research Organisation, Data61, Sydney 2122, Australia.
Tang JW; School of Chemical Engineering, The University of New South Wales, Sydney 2052, Australia.
Abuadbba A; Commonwealth Scientific and Industrial Research Organisation, Data61, Sydney 2122, Australia.
Gao Y; Cyber Security Cooperative Research Centre, Australian Capital Territory 2604, Australia.
Camtepe S; Commonwealth Scientific and Industrial Research Organisation, Data61, Sydney 2122, Australia.
Nepal S; Commonwealth Scientific and Industrial Research Organisation, Data61, Sydney 2122, Australia.
Almashor M; Commonwealth Scientific and Industrial Research Organisation, Data61, Sydney 2122, Australia.
Zheng Y; Cyber Security Cooperative Research Centre, Australian Capital Territory 2604, Australia.

Sensors (Basel) ; 23(9)2023 Apr 27.

Article em En | MEDLINE | ID: mdl-37177549

RESUMO

The use of artificial intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which has opened it up to a myriad of privacy, trust, and legal issues. Moreover, organizations have been loath to share emails, given the risk of leaking commercially sensitive information. Consequently, it has been difficult to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly federated learning (FL), is a desideratum. As it is already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein was the first to investigate the use of FL in phishing email detection. This study focused on building upon a deep neural network model, particularly recurrent convolutional neural network (RNN) and bidirectional encoder representations from transformers (BERT), for phishing email detection. We analyzed the FL-entangled learning performance in various settings, including (i) a balanced and asymmetrical data distribution among organizations and (ii) scalability. Our results corroborated the comparable performance statistics of FL in phishing email detection to centralized learning for balanced datasets and low organizational counts. Moreover, we observed a variation in performance when increasing the organizational counts. For a fixed total email dataset, the global RNN-based model had a 1.8% accuracy decrease when the organizational counts were increased from 2 to 10. In contrast, BERT accuracy increased by 0.6% when increasing organizational counts from 2 to 5. However, if we increased the overall email dataset by introducing new organizations in the FL framework, the organizational level performance improved by achieving a faster convergence speed. In addition, FL suffered in its overall global model performance due to highly unstable outputs if the email dataset distribution was highly asymmetric.

Palavras-chave

bidirectional encoder representations from transformers (BERT); federated learning; phishing email detection; recurrent neural network

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Diagnostic_studies Idioma: En Revista: Sensors (Basel) Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Austrália

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google