RESUMO
The COVID-19 pandemic is marked by the successive emergence of new SARS-CoV-2 variants, lineages, and sublineages that outcompete earlier strains, largely due to factors like increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system, to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute >10% of all the viral sequences added to the GISAID, a public database supporting viral genetic sequence sharing, in a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of ~4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01%-3%), with median lead times of 4-17 weeks, and predicts FDLs between ~5 and ~25 times better than a baseline approach. For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness and may provide significant insights for the optimization of public health 'pre-emptive' intervention strategies.
Assuntos
COVID-19 , Aprendizado Profundo , SARS-CoV-2 , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , COVID-19/virologia , COVID-19/epidemiologia , Humanos , Glicoproteína da Espícula de Coronavírus/genética , Previsões/métodos , PandemiasRESUMO
High-frequency (HF) signals are ubiquitous in the industrial world and are of great use for monitoring of industrial assets. Most deep-learning tools are designed for inputs of fixed and/or very limited size and many successful applications of deep learning to the industrial context use as inputs extracted features, which are a manually and often arduously obtained compact representation of the original signal. In this paper, we propose a fully unsupervised deep-learning framework that is able to extract a meaningful and sparse representation of raw HF signals. We embed in our architecture important properties of the fast discrete wavelet transform (FDWT) such as 1) the cascade algorithm; 2) the conjugate quadrature filter property that links together the wavelet, the scaling, and transposed filter functions; and 3) the coefficient denoising. Using deep learning, we make this architecture fully learnable: Both the wavelet bases and the wavelet coefficient denoising become learnable. To achieve this objective, we propose an activation function that performs a learnable hard thresholding of the wavelet coefficients. With our framework, the denoising FDWT becomes a fully learnable unsupervised tool that does not require any type of pre- or postprocessing or any prior knowledge on wavelet transform. We demonstrate the benefits of embedding all these properties on three machine-learning tasks performed on open-source sound datasets. We perform an ablation study of the impact of each property on the performance of the architecture, achieve results well above baseline, and outperform other state-of-the-art methods.
RESUMO
Copy-number variations (CNVs), which refer to deletions and duplications of chromosomal segments, represent a significant source of variation among individuals, contributing to human evolution and being implicated in various diseases ranging from mental illness and developmental disorders to cancer. Despite the development of several methods for detecting copy number variations based on next-generation sequencing (NGS) data, achieving robust detection performance for CNVs with arbitrary coverage and amplitude remains challenging due to the inherent complexity of sequencing samples. In this paper, we propose an alternative method called OTSUCNV for CNV detection on whole genome sequencing (WGS) data. This method utilizes a newly designed adaptive sequence segmentation algorithm and an OTSU-based CNV prediction algorithm, which does not rely on any distribution assumptions or involve complex outlier factor calculations. As a result, the effective detection of CNVs is achieved with lower computational complexity. The experimental results indicate that the proposed method demonstrates outstanding performance, and hence it may be used as an effective tool for CNV detection.
Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do GenomaRESUMO
Anomaly detection in medical imaging, particularly within the realm of magnetic resonance imaging (MRI), stands as a vital area of research with far-reaching implications across various medical fields. This review meticulously examines the integration of artificial intelligence (AI) in anomaly detection for MR images, spotlighting its transformative impact on medical diagnostics. We delve into the forefront of AI applications in MRI, exploring advanced machine learning (ML) and deep learning (DL) methodologies that are pivotal in enhancing the precision of diagnostic processes. The review provides a detailed analysis of preprocessing, feature extraction, classification, and segmentation techniques, alongside a comprehensive evaluation of commonly used metrics. Further, this paper explores the latest developments in ensemble methods and explainable AI, offering insights into future directions and potential breakthroughs. This review synthesizes current insights, offering a valuable guide for researchers, clinicians, and medical imaging experts. It highlights AI's crucial role in improving the precision and speed of detecting key structural and functional irregularities in MRI. Our exploration of innovative techniques and trends furthers MRI technology development, aiming to refine diagnostics, tailor treatments, and elevate patient care outcomes. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY: Stage 1.
RESUMO
BACKGROUND: Malaria continues to pose a significant health threat. Rapid identification of malaria infections and the deployment of active surveillance tools are crucial for achieving malaria elimination in regions where malaria is endemic, such as certain areas of Thailand. In this study, an anomaly detection system is introduced as an early warning mechanism for potential malaria outbreaks in countries like Thailand. METHODS: Unsupervised clustering-based, and time series-based anomaly detection algorithms are developed and compared to identify abnormal malaria activity in Thailand. Additionally, a user interface tailored for anomaly detection is designed, enabling the Thai malaria surveillance team to utilize these algorithms and visualize regions exhibiting unusual malaria patterns. RESULTS: Nine distinct anomaly detection algorithms we developed. Their efficacy in pinpointing verified outbreaks was assessed using malaria case data from Thailand spanning 2012 to 2022. The historical average threshold-based anomaly detection method triggered three times fewer alerts, while correctly identifying the same number of verified outbreaks when compared to the current method used in Thailand. A limitation of this analysis is the small number of verified outbreaks; further consultation with the Division of Vector Borne Disease could help identify more verified outbreaks. The developed dashboard, designed specifically for anomaly detection, allows disease surveillance professionals to easily identify and visualize unusual malaria activity at a provincial level across Thailand. CONCLUSION: An enhanced early warning system is proposed to bolster malaria elimination efforts for countries with a similar malaria profile to Thailand. The developed anomaly detection algorithms, after thorough comparison, have been optimized for integration with the current malaria surveillance infrastructure. An anomaly detection dashboard for Thailand is built and supports early detection of abnormal malaria activity. In summary, the proposed early warning system enhances the identification process for provinces at risk of outbreaks and offers easy integration with Thailand's established malaria surveillance framework.
Assuntos
Malária , Humanos , Tailândia/epidemiologia , Malária/epidemiologia , Malária/prevenção & controle , Algoritmos , Análise por Conglomerados , Surtos de DoençasRESUMO
OBJECTIVES: Conventional autoverification rules evaluate analytes independently, potentially missing unusual patterns of results indicative of errors such as serum contamination by collection tube additives. This study assessed whether multivariate anomaly detection algorithms could enhance the detection of such errors. METHODS: Multivariate Gaussian, k-nearest neighbours (KNN) distance, and one-class support vector machine (SVM) anomaly detection models, along with conventional limit checks, were developed using a training dataset of 127,451 electrolyte, urea, and creatinine (EUC) results, with a 5â¯% flagging rate targeted for all approaches. The models were compared with limit checks for their ability to detect atypical EUC results from samples spiked with additives from collection tubes: EDTA, fluoride, sodium citrate, or acid citrate dextrose (n=200 per contaminant). The study additionally assessed the ability of the models to identify 127,449 single-analyte errors, a potential weakness of multivariate models. RESULTS: The KNN distance and SVM models outperformed limit checks for detecting all contaminants (p-values <0.05). The multivariate Gaussian model did not surpass limit checks for detecting EDTA contamination but was superior for detecting the other additives. All models surpassed limit checks for identifying single-analyte errors, with the KNN distance model demonstrating the highest overall sensitivity. CONCLUSIONS: Multivariate anomaly detection models, particularly the KNN distance model, were superior to the conventional approach for detecting serum contamination and single-analyte errors. Developing multivariate approaches to autoverification is warranted to optimise error detection and improve patient safety.
Assuntos
Algoritmos , Humanos , Análise Multivariada , Testes de Química Clínica/normas , Testes de Química Clínica/métodos , Máquina de Vetores de Suporte , Ureia/sangue , Ureia/análise , Creatinina/sangueRESUMO
OBJECTIVES: Patient-based real-time quality control (PBRTQC) is an alternative tool for laboratories that has gained increasing attention. Despite the progress made by using various algorithms, the problems of data volume imbalance between in-control and out-of-control results, as well as the issue of variation remain challenges. We propose a novel integrated framework using anomaly detection and graph neural network, combining clinical variables and statistical algorithms, to improve the error detection performance of patient-based quality control. METHODS: The testing results of three representative analytes (sodium, potassium, and calcium) and eight independent variables of patients (test date, time, gender, age, department, patient type, and reference interval limits) were collected. Graph-based anomaly detection network was modeled and used to generate control limits. Proportional and random errors were simulated for performance evaluation. Five mainstream PBRTQC statistical algorithms were chosen for comparison. RESULTS: The framework of a patient-based graph anomaly detection network for real-time quality control (PGADQC) was established and proven feasible for error detection. Compared with classic PBRTQC, the PGADQC showed a more balanced performance for both positive and negative biases. For different analytes, the average number of patient samples until error detection (ANPed) of PGADQC decreased variably, and reductions could reach up to approximately 95â¯% at a small bias of 0.02 taking calcium as an example. CONCLUSIONS: The PGADQC is an effective framework for patient-based quality control, integrating statistical and artificial intelligence algorithms. It improves error detection in a data-driven fashion and provides a new approach for PBRTQC from the data science perspective.
Assuntos
Algoritmos , Controle de Qualidade , Humanos , Redes Neurais de Computação , Feminino , Masculino , Sódio/análise , Sódio/sangue , Cálcio/análise , Cálcio/sangue , Potássio/análise , Potássio/sangue , AdultoRESUMO
OBJECTIVES: Lymphocyte subsets are the predictors of disease diagnosis, treatment, and prognosis. Determination of lymphocyte subsets is usually carried out by flow cytometry. Despite recent advances in flow cytometry analysis, most flow cytometry data can be challenging with manual gating, which is labor-intensive, time-consuming, and error-prone. This study aimed to develop an automated method to identify lymphocyte subsets. METHODS: We propose a knowledge-driven combined with data-driven method which can gate automatically to achieve subset identification. To improve accuracy and stability, we have implemented a Loop Adjustment Gating to optimize the gating result of the lymphocyte population. Furthermore, we have incorporated an anomaly detection mechanism to issue warnings for samples that might not have been successfully analyzed, ensuring the quality of the results. RESULTS: The evaluation showed a 99.2â¯% correlation between our method results and manual analysis with a dataset of 2,000 individual cases from lymphocyte subset assays. Our proposed method attained 97.7â¯% accuracy for all cases and 100â¯% for the high-confidence cases. With our automated method, 99.1â¯% of manual labor can be saved when reviewing only the low-confidence cases, while the average turnaround time required is only 29â¯s, reducing by 83.7â¯%. CONCLUSIONS: Our proposed method can achieve high accuracy in flow cytometry data from lymphocyte subset assays. Additionally, it can save manual labor and reduce the turnaround time, making it have the potential for application in the laboratory.
Assuntos
Citometria de Fluxo , Subpopulações de Linfócitos , Subpopulações de Linfócitos/classificação , Subpopulações de Linfócitos/citologia , Citometria de Fluxo/métodos , Citometria de Fluxo/normas , Automação Laboratorial , Reprodutibilidade dos Testes , HumanosRESUMO
Image anomaly detection (AD) is widely researched on many occasions in computer vision tasks. High-dimensional data, such as image data, with noise and complex background is still challenging to detect anomalies under the situation that imbalanced or incomplete data are available. Some deep learning methods can be trained in an unsupervised way and map the original input into low-dimensional manifolds to predict larger differences in anomalies according to normal ones by dimension reduction. However, training a single low-dimension latent space is limited to present the low-dimensional features due to the fact that the noise and irreverent features are mapped into this space, resulting in that the manifolds are not discriminative for detecting anomalies. To address this problem, a new autoencoder framework is proposed in this study with two trainable mutually orthogonal complementary subspaces in the latent space, by latent subspace projection (LSP) mechanism, which is named as LSP-CAE. Specifically, latent subspace projection is used to train the latent image subspace (LIS) and the latent kernel subspace (LKS) in the latent space of the autoencoder-like model respectively, which can enhance learning power of different features from the input instance. The features of normal data are projected into the latent image subspace, while the latent kernel subspace is trained to extract the irrelevant information according to normal features by end-to-end training. To verify the generality and effectiveness of the proposed method, we replace the convolutional network with the fully-connected network contucted in the real-world medical datasets. The anomaly score based on projection norms in two subspace is used to evaluate the anomalies in the testing. Consequently, our proposed method can achieve the best performance according to four public datasets in comparison of the state-of-the-art methods.
Assuntos
AlgoritmosRESUMO
OBJECTIVE: Physicians and clinicians rely on data contained in electronic health records (EHRs), as recorded by health information technology (HIT), to make informed decisions about their patients. The reliability of HIT systems in this regard is critical to patient safety. Consequently, better tools are needed to monitor the performance of HIT systems for potential hazards that could compromise the collected EHRs, which in turn could affect patient safety. In this paper, we propose a new framework for detecting anomalies in EHRs using sequence of clinical events. This new framework, EHR-Bidirectional Encoder Representations from Transformers (BERT), is motivated by the gaps in the existing deep-learning related methods, including high false negatives, sub-optimal accuracy, higher computational cost, and the risk of information loss. EHR-BERT is an innovative framework rooted in the BERT architecture, meticulously tailored to navigate the hurdles in the contemporary BERT method; thus, enhancing anomaly detection in EHRs for healthcare applications. METHODS: The EHR-BERT framework was designed using the Sequential Masked Token Prediction (SMTP) method. This approach treats EHRs as natural language sentences and iteratively masks input tokens during both training and prediction stages. This method facilitates the learning of EHR sequence patterns in both directions for each event and identifies anomalies based on deviations from the normal execution models trained on EHR sequences. RESULTS: Extensive experiments on large EHR datasets across various medical domains demonstrate that EHR-BERT markedly improves upon existing models. It significantly reduces the number of false positives and enhances the detection rate, thus bolstering the reliability of anomaly detection in electronic health records. This improvement is attributed to the model's ability to minimize information loss and maximize data utilization effectively. CONCLUSION: EHR-BERT showcases immense potential in decreasing medical errors related to anomalous clinical events, positioning itself as an indispensable asset for enhancing patient safety and the overall standard of healthcare services. The framework effectively overcomes the drawbacks of earlier models, making it a promising solution for healthcare professionals to ensure the reliability and quality of health data.
Assuntos
Registros Eletrônicos de Saúde , Sistemas de Informação em Saúde , Humanos , Reprodutibilidade dos Testes , Registros , Pessoal de SaúdeRESUMO
Monitoring Surveillance video is really time-consuming, and the complexity of typical crowd behaviour in crowded situations makes this even more challenging. This has sparked a curiosity about computer vision-based anomaly detection. This study introduces a new crowd anomaly detection method with two main steps: Visual Attention Detection and Anomaly Detection. The Visual Attention Detection phase uses an Enhanced Bilateral Texture-Based Methodology to pinpoint crucial areas in crowded scenes, improving anomaly detection precision. Next, the Anomaly Detection phase employs Optimized Deep Maxout Network to robustly identify unusual behaviours. This network's deep learning capabilities are essential for detecting complex patterns in diverse crowd scenarios. To enhance accuracy, the model is trained using the innovative Battle Royale Coalesced Atom Search Optimization (BRCASO) algorithm, which fine-tunes optimal weights for superior performance, ensuring heightened detection accuracy and reliability. Lastly, using various performance metrics, the suggested work's effectiveness will be contrasted with that of the other traditional approaches. The proposed crowd anomaly detection is implemented in Python. On observing the result showed that the suggested model attains a detection accuracy of 97.28% at a learning rate of 90%, which is much superior than the detection accuracy of other models, including ASO = 90.56%, BMO = 91.39%, BES = 88.63%, BRO = 86.98%, and FFLY = 89.59%.
RESUMO
With the continuous modernization of water plants, the risk of cyberattacks on them potentially endangers public health and the economic efficiency of water treatment and distribution. This article signifies the importance of developing improved techniques to support cyber risk management for critical water infrastructure, given an evolving threat environment. In particular, we propose a method that uniquely combines machine learning, the theory of belief functions, operational performance metrics, and dynamic visualization to provide the required granularity for attack inference, localization, and impact estimation. We illustrate how the focus on visual domain-aware anomaly exploration leads to performance improvement, more precise anomaly localization, and effective risk prioritization. Proposed elements of the method can be used independently, supporting the exploration of various anomaly detection methods. It thus can facilitate the effective management of operational risk by providing rich context information and bridging the interpretation gap.
RESUMO
The number of people who need to use wheelchair for proper mobility is increasing. The integration of technology into these devices enables the simultaneous and objective assessment of posture, while also facilitating the concurrent monitoring of the functional status of wheelchair users. In this way, both the health personnel and the user can be provided with relevant information for the recovery process. This information can be used to carry out an early adaptation of the rehabilitation of patients, thus allowing to prevent further musculoskeletal problems, as well as risk situations such as ulcers or falls. Thus, a higher quality of life is promoted in affected individuals. As a result, this paper presents an orderly and organized analysis of the existing postural diagnosis systems for detecting sitting anomalies in the literature. This analysis can be divided into two parts that compose such postural diagnosis: on the one hand, the monitoring devices necessary for the collection of postural data and, on the other hand, the techniques used for anomaly detection. These anomaly detection techniques will be explained under two different approaches: the traditional generalized approach followed to date by most works, where anomalies are treated as incorrect postures, and a new individualized approach treating anomalies as changes with respect to the normal sitting pattern. In this way, the advantages, limitations and opportunities of the different techniques are analyzed. The main contribution of this overview paper is to synthesize and organize information, identify trends, and provide a comprehensive understanding of sitting posture diagnosis systems, offering researchers an accessible resource for navigating the current state of knowledge of this particular field.
Assuntos
Qualidade de Vida , Cadeiras de Rodas , Humanos , Postura Sentada , Postura , Pessoal de SaúdeRESUMO
As cyber-attacks increase in unencrypted communication environments such as the traditional Internet, protected communication channels based on cryptographic protocols, such as transport layer security (TLS), have been introduced to the Internet. Accordingly, attackers have been carrying out cyber-attacks by hiding themselves in protected communication channels. However, the nature of channels protected by cryptographic protocols makes it difficult to distinguish between normal and malicious network traffic behaviors. This means that traditional anomaly detection models with features from packets extracted a deep packet inspection (DPI) have been neutralized. Recently, studies on anomaly detection using artificial intelligence (AI) and statistical characteristics of traffic have been proposed as an alternative. In this review, we provide a systematic review for AI-based anomaly detection techniques over encrypted traffic. We set several research questions on the review topic and collected research according to eligibility criteria. Through the screening process and quality assessment, 30 research articles were selected with high suitability to be included in the review from the collected literature. We reviewed the selected research in terms of dataset, feature extraction, feature selection, preprocessing, anomaly detection algorithm, and performance indicators. As a result of the literature review, it was confirmed that various techniques used for AI-based anomaly detection over encrypted traffic were used. Some techniques are similar to those used for AI-based anomaly detection over unencrypted traffic, but some technologies are different from those used for unencrypted traffic.
RESUMO
The Internet's default inter-domain routing system, the Border Gateway Protocol (BGP), remains insecure. Detection techniques are dominated by approaches that involve large numbers of features, parameters, domain-specific tuning, and training, often contributing to an unacceptable computational cost. Efforts to detect anomalous activity in the BGP have been almost exclusively focused on single observable monitoring points and Autonomous Systems (ASs). BGP attacks can exploit and evade these limitations. In this paper, we review and evaluate categories of BGP attacks based on their complexity. Previously identified next-generation BGP detection techniques remain incapable of detecting advanced attacks that exploit single observable detection approaches and those designed to evade public routing monitor infrastructures. Advanced BGP attack detection requires lightweight, rapid capabilities with the capacity to quantify group-level multi-viewpoint interactions, dynamics, and information. We term this approach advanced BGP anomaly detection. This survey evaluates 178 anomaly detection techniques and identifies which are candidates for advanced attack anomaly detection. Preliminary findings from an exploratory investigation of advanced BGP attack candidates are also reported.
RESUMO
Existing industrial image anomaly detection techniques predominantly utilize codecs based on convolutional neural networks (CNNs). However, traditional convolutional autoencoders are limited to local features, struggling to assimilate global feature information. CNNs' generalizability enables the reconstruction of certain anomalous regions. This is particularly evident when normal and abnormal regions, despite having similar pixel values, contain different semantic information, leading to ineffective anomaly detection. Furthermore, collecting abnormal image samples during actual industrial production poses challenges, often resulting in data imbalance. To mitigate these issues, this study proposes an unsupervised anomaly detection model employing the Vision Transformer (ViT) architecture, incorporating a Transformer structure to understand the global context between image blocks, thereby extracting a superior representation of feature information. It integrates a memory module to catalog normal sample features, both to counteract anomaly reconstruction issues and bolster feature representation, and additionally introduces a coordinate attention (CA) mechanism to intensify focus on image features at both spatial and channel dimensions, minimizing feature information loss and thereby enabling more precise anomaly identification and localization. Experiments conducted on two public datasets, MVTec AD and BeanTech AD, substantiate the method's effectiveness, demonstrating an approximate 20% improvement in average AUROC% at the image level over traditional convolutional encoders.
RESUMO
To date, significant progress has been made in the field of railway anomaly detection using technologies such as real-time data analytics, the Internet of Things, and machine learning. As technology continues to evolve, the ability to detect and respond to anomalies in railway systems is once again in the spotlight. However, railway anomaly detection faces challenges related to the vast infrastructure, dynamic conditions, aging infrastructure, and adverse environmental conditions on the one hand, and the scale, complexity, and critical safety implications of railway systems on the other. Our study is underpinned by the three objectives. Specifically, we aim to identify time series anomaly detection methods applied to railway sensor device data, recognize the advantages and disadvantages of these methods, and evaluate their effectiveness. To address the research objectives, the first part of the study involved a systematic literature review and a series of controlled experiments. In the case of the former, we adopted well-established guidelines to structure and visualize the review. In the second part, we investigated the effectiveness of selected machine learning methods. To evaluate the predictive performance of each method, a five-fold cross-validation approach was applied to ensure the highest accuracy and generality. Based on the calculated accuracy, the results show that the top three methods are CatBoost (96%), Random Forest (91%), and XGBoost (90%), whereas the lowest accuracy is observed for One-Class Support Vector Machines (48%), Local Outlier Factor (53%), and Isolation Forest (55%). As the industry moves toward a zero-defect paradigm on a global scale, ongoing research efforts are focused on improving existing methods and developing new ones that contribute to the safety and quality of rail transportation. In this sense, there are at least four avenues for future research worth considering: testing richer data sets, hyperparameter optimization, and implementing other methods not included in the current study.
RESUMO
Detecting violent behavior in videos to ensure public safety and security poses a significant challenge. Precisely identifying and categorizing instances of violence in real-life closed-circuit television, which vary across specifications and locations, requires comprehensive understanding and processing of the sequential information embedded in these videos. This study aims to introduce a model that adeptly grasps the spatiotemporal context of videos within diverse settings and specifications of violent scenarios. We propose a method to accurately capture spatiotemporal features linked to violent behaviors using optical flow and RGB data. The approach leverages a Conv3D-based ResNet-3D model as the foundational network, capable of handling high-dimensional video data. The efficiency and accuracy of violence detection are enhanced by integrating an attention mechanism, which assigns greater weight to the most crucial frames within the RGB and optical-flow sequences during instances of violence. Our model was evaluated on the UBI-Fight, Hockey, Crowd, and Movie-Fights datasets; the proposed method outperformed existing state-of-the-art techniques, achieving area under the curve scores of 95.4, 98.1, 94.5, and 100.0 on the respective datasets. Moreover, this research not only has the potential to be applied in real-time surveillance systems but also promises to contribute to a broader spectrum of research in video analysis and understanding.
Assuntos
Fluxo Óptico , Violência , Sistemas ComputacionaisRESUMO
The recent rapid growth in Internet of Things (IoT) technologies is enriching our daily lives but significant information security risks in IoT fields have become apparent. In fact, there have been large-scale botnet attacks that exploit undiscovered vulnerabilities, known as zero-day attacks. Several intrusion detection methods based on network traffic monitoring have been proposed to address this issue. These methods employ federated learning to share learned attack information among multiple IoT networks, aiming to improve collective detection capabilities against attacks including zero-day attacks. Although their ability to detect zero-day attacks with high precision has been confirmed, challenges such as autonomous labeling of attacks from traffic information and attack information sharing between different device types still remain. To resolve the issues, this paper proposes IDAC, a novel intrusion detection method with autonomous attack candidate labeling and federated learning-based attack candidate sharing. The labeling of attack candidates in IDAC is executed using information autonomously extracted from traffic information, and the labeling can also be applied to zero-day attacks. The federated learning-based attack candidate sharing enables candidate aggregation from multiple networks, and it executes attack determination based on the aggregated similar candidates. Performance evaluations demonstrated that IDS with IDAC within networks based on attack candidates is feasible and achieved comparable detection performance against multiple attacks including zero-day attacks compared to the existing methods while suppressing false positives in the extraction of attack candidates. In addition, the sharing of autonomously extracted attack candidates from multiple networks improves both detection performance and the required time for attack detection.
RESUMO
In recent years, smart water sensing technology has played a crucial role in water management, addressing the pressing need for efficient monitoring and control of water resources analysis. The challenge in smart water sensing technology resides in ensuring the reliability and accuracy of the data collected by sensors. Outliers are a well-known problem in smart sensing as they can negatively affect the viability of useful analysis and make it difficult to evaluate pertinent data. In this study, we evaluate the performance of four sensors: electrical conductivity (EC), dissolved oxygen (DO), temperature (Temp), and pH. We implement four classical machine learning models: support vector machine (SVM), artifical neural network (ANN), decision tree (DT), and isolated forest (iForest)-based outlier detection as a pre-processing step before visualizing the data. The dataset was collected by a real-time smart water sensing monitoring system installed in Brussels' lakes, rivers, and ponds. The obtained results clearly show that the SVM outperforms the other models, showing 98.38% F1-score rates for pH, 96.98% F1-score rates for temp, 97.88% F1-score rates for DO, and 98.11% F1-score rates for EC. Furthermore, ANN also achieves a significant results, establishing it as a viable alternative.