Búsqueda | BVS CLAP/SMR-OPS/OMS

Evaluation of Federated Learning in Phishing Email Detection.

Thapa, Chandra; Tang, Jun Wen; Abuadbba, Alsharif; Gao, Yansong; Camtepe, Seyit; Nepal, Surya; Almashor, Mahathir; Zheng, Yifeng.

Sensors (Basel) ; 23(9)2023 Apr 27.

Artículo en Inglés | MEDLINE | ID: mdl-37177549

RESUMEN

The use of artificial intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which has opened it up to a myriad of privacy, trust, and legal issues. Moreover, organizations have been loath to share emails, given the risk of leaking commercially sensitive information. Consequently, it has been difficult to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly federated learning (FL), is a desideratum. As it is already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein was the first to investigate the use of FL in phishing email detection. This study focused on building upon a deep neural network model, particularly recurrent convolutional neural network (RNN) and bidirectional encoder representations from transformers (BERT), for phishing email detection. We analyzed the FL-entangled learning performance in various settings, including (i) a balanced and asymmetrical data distribution among organizations and (ii) scalability. Our results corroborated the comparable performance statistics of FL in phishing email detection to centralized learning for balanced datasets and low organizational counts. Moreover, we observed a variation in performance when increasing the organizational counts. For a fixed total email dataset, the global RNN-based model had a 1.8% accuracy decrease when the organizational counts were increased from 2 to 10. In contrast, BERT accuracy increased by 0.6% when increasing organizational counts from 2 to 5. However, if we increased the overall email dataset by introducing new organizations in the FL framework, the organizational level performance improved by achieving a faster convergence speed. In addition, FL suffered in its overall global model performance due to highly unstable outputs if the email dataset distribution was highly asymmetric.

The Compression Optimality of Asymmetric Numeral Systems.

Pieprzyk, Josef; Duda, Jarek; Pawlowski, Marcin; Camtepe, Seyit; Mahboubi, Arash; Morawiecki, Pawel.

Entropy (Basel) ; 25(4)2023 Apr 17.

Artículo en Inglés | MEDLINE | ID: mdl-37190460

RESUMEN

Source coding has a rich and long history. However, a recent explosion of multimedia Internet applications (such as teleconferencing and video streaming, for instance) renews interest in fast compression that also squeezes out as much redundancy as possible. In 2009 Jarek Duda invented his asymmetric numeral system (ANS). Apart from having a beautiful mathematical structure, it is very efficient and offers compression with a very low coding redundancy. ANS works well for any symbol source statistics, and it has become a preferred compression algorithm in the IT industry. However, designing an ANS instance requires a random selection of its symbol spread function. Consequently, each ANS instance offers compression with a slightly different compression ratio. The paper investigates the compression optimality of ANS. It shows that ANS is optimal for any symbol sources whose probability distribution is described by natural powers of 1/2. We use Markov chains to calculate ANS state probabilities. This allows us to precisely determine the ANS compression rate. We present two algorithms for finding ANS instances with a high compression ratio. The first explores state probability approximations in order to choose ANS instances with better compression ratios. The second algorithm is a probabilistic one. It finds ANS instances whose compression ratios can be made as close to the best ratio as required. This is done at the expense of the number Î¸ of internal random "coin" tosses. The algorithm complexity is O(Î¸L3), where L is the number of ANS states. The complexity can be reduced to O(Î¸Llog2L) if we use a fast matrix inversion. If the algorithm is implemented on a quantum computer, its complexity becomes O(Î¸(log2L)3).

Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond.

Joshi, Praveen; Thapa, Chandra; Camtepe, Seyit; Hasanuzzaman, Mohammed; Scully, Ted; Afli, Haithem.

Methods Protoc ; 5(4)2022 Jul 13.

Artículo en Inglés | MEDLINE | ID: mdl-35893586

RESUMEN

Machine learning (ML) in healthcare data analytics is attracting much attention because of the unprecedented power of ML to extract knowledge that improves the decision-making process. At the same time, laws and ethics codes drafted by countries to govern healthcare data are becoming stringent. Although healthcare practitioners are struggling with an enforced governance framework, we see the emergence of distributed learning-based frameworks disrupting traditional-ML-model development. Splitfed learning (SFL) is one of the recent developments in distributed machine learning that empowers healthcare practitioners to preserve the privacy of input data and enables them to train ML models. However, SFL has some extra communication and computation overheads at the client side due to the requirement of client-side model synchronization. For a resource-constrained client side (hospitals with limited computational powers), removing such conditions is required to gain efficiency in the learning. In this regard, this paper studies SFL without client-side model synchronization. The resulting architecture is known as multi-head split learning (MHSL). At the same time, it is important to investigate information leakage, which indicates how much information is gained by the server related to the raw data directly out of the smashed data-the output of the client-side model portion-passed to it by the client. Our empirical studies examine the Resnet-18 and Conv1-D architecture model on the ECG and HAM-10000 datasets under IID data distribution. The results find that SFL provides 1.81% and 2.36% better accuracy than MHSL on the ECG and HAM-10000 datasets, respectively (for cut-layer value set to 1). Analysis of experimentation with various client-side model portions demonstrates that it has an impact on the overall performance. With an increase in layers in the client-side model portion, SFL performance improves while MHSL performance degrades. Experiment results also demonstrate that information leakage provided by mutual information score values in SFL is more than MHSL for ECG and HAM-10000 datasets by 2×10-5 and 4×10-3, respectively.

Precision health data: Requirements, challenges and existing techniques for data security and privacy.

Thapa, Chandra; Camtepe, Seyit.

Comput Biol Med ; 129: 104130, 2021 02.

Artículo en Inglés | MEDLINE | ID: mdl-33271399

RESUMEN

Precision health leverages information from various sources, including omics, lifestyle, environment, social media, medical records, and medical insurance claims to enable personalized care, prevent and predict illness, and precise treatments. It extensively uses sensing technologies (e.g., electronic health monitoring devices), computations (e.g., machine learning), and communication (e.g., interaction between the health data centers). As health data contain sensitive private information, including the identity of patient and carer and medical conditions of the patient, proper care is required at all times. Leakage of these private information affects the personal life, including bullying, high insurance premium, and loss of job due to the medical history. Thus, the security, privacy of and trust on the information are of utmost importance. Moreover, government legislation and ethics committees demand the security and privacy of healthcare data. Besides, the public, who is the data source, always expects the security, privacy, and trust of their data. Otherwise, they can avoid contributing their data to the precision health system. Consequently, as the public is the targeted beneficiary of the system, the effectiveness of precision health diminishes. Herein, in the light of precision health data security, privacy, ethical and regulatory requirements, finding the best methods and techniques for the utilization of the health data, and thus precision health is essential. In this regard, firstly, this paper explores the regulations, ethical guidelines around the world, and domain-specific needs. Then it presents the requirements and investigates the associated challenges. Secondly, this paper investigates secure and privacy-preserving machine learning methods suitable for the computation of precision health data along with their usage in relevant health projects. Finally, it illustrates the best available techniques for precision health data security and privacy with a conceptual system model that enables compliance, ethics clearance, consent management, medical innovations, and developments in the health domain.

Asunto(s)

Medicina de Precisión , Privacidad , Seguridad Computacional , Confidencialidad , Humanos

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA