RESUMO
In recent years, cybersecurity has been strengthened through the adoption of processes, mechanisms and rapid sources of indicators of compromise in critical areas. Among the most latent challenges are the detection, classification and eradication of malware and Denial of Service Cyber-Attacks (DoS). The literature has presented different ways to obtain and evaluate malware- and DoS-cyber-attack-related instances, either from a technical point of view or by offering ready-to-use datasets. However, acquiring fresh, up-to-date samples requires an arduous process of exploration, sandbox configuration and mass storage, which may ultimately result in an unbalanced or under-represented set. Synthetic sample generation has shown that the cost associated with setting up controlled environments and time spent on sample evaluation can be reduced. Nevertheless, the process is performed when the observations already belong to a characterized set, totally detached from a real environment. In order to solve the aforementioned, this work proposes a methodology for the generation of synthetic samples of malicious Portable Executable binaries and DoS cyber-attacks. The task is performed via a Reinforcement Learning engine, which learns from a baseline of different malware families and DoS cyber-attack network properties, resulting in new, mutated and highly functional samples. Experimental results demonstrate the high adaptability of the outputs as new input datasets for different Machine Learning algorithms.
RESUMO
In the last decade, face-recognition and -verification methods based on deep learning have increasingly used deeper and more complex architectures to obtain state-of-the-art (SOTA) accuracy. Hence, these architectures are limited to powerful devices that can handle heavy computational resources. Conversely, lightweight and efficient methods have recently been proposed to achieve real-time performance on limited devices and embedded systems. However, real-time face-verification methods struggle with problems usually solved by their heavy counterparts-for example, illumination changes, occlusions, face rotation, and distance to the subject. These challenges are strongly related to surveillance applications that deal with low-resolution face images under unconstrained conditions. Therefore, this paper compares three SOTA real-time face-verification methods for coping with specific problems in surveillance applications. To this end, we created an evaluation subset from two available datasets consisting of 3000 face images presenting face rotation and low-resolution problems. We defined five groups of face rotation with five levels of resolutions that can appear in common surveillance scenarios. With our evaluation subset, we methodically evaluated the face-verification accuracy of MobileFaceNet, EfficientNet-B0, and GhostNet. Furthermore, we also evaluated them with conventional datasets, such as Cross-Pose LFW and QMUL-SurvFace. When examining the experimental results of the three mentioned datasets, we found that EfficientNet-B0 could deal with both surveillance problems, but MobileFaceNet was better at handling extreme face rotation over 80 degrees.
RESUMO
Most of the methods for real-time semantic segmentation do not take into account temporal information when working with video sequences. This is counter-intuitive in real-world scenarios where the main application of such methods is, precisely, being able to process frame sequences as quickly and accurately as possible. In this paper, we address this problem by exploiting the temporal information provided by previous frames of the video stream. Our method leverages a previous input frame as well as the previous output of the network to enhance the prediction accuracy of the current input frame. We develop a module that obtains feature maps rich in change information. Additionally, we incorporate the previous output of the network into all the decoder stages as a way of increasing the attention given to relevant features. Finally, to properly train and evaluate our methods, we introduce CityscapesVid, a dataset specifically designed to benchmark semantic video segmentation networks. Our proposed network, entitled FASSVid improves the mIoU accuracy performance over a standard non-sequential baseline model. Moreover, FASSVid obtains state-of-the-art inference speed and competitive mIoU results compared to other state-of-the-art lightweight networks, with significantly lower number of computations. Specifically, we obtain 71% of mIoU in our CityscapesVid dataset, running at 114.9 FPS on a single NVIDIA GTX 1080Ti and 31 FPS on the NVIDIA Jetson Nano embedded board with images of size 1024×2048 and 512×1024, respectively.
RESUMO
Facial recognition is fundamental for a wide variety of security systems operating in real-time applications. Recently, several deep neural networks algorithms have been developed to achieve state-of-the-art performance on this task. The present work was conceived due to the need for an efficient and low-cost processing system, so a real-time facial recognition system was proposed using a combination of deep learning algorithms like FaceNet and some traditional classifiers like SVM, KNN, and RF using moderate hardware to operate in an unconstrained environment. Generally, a facial recognition system involves two main tasks: face detection and recognition. The proposed scheme uses the YOLO-Face method for the face detection task which is a high-speed real-time detector based on YOLOv3, while, for the recognition stage, a combination of FaceNet with a supervised learning algorithm, such as the support vector machine (SVM), is proposed for classification. Extensive experiments on unconstrained datasets demonstrate that YOLO-Face provides better performance when the face under an analysis presents partial occlusion and pose variations; besides that, it can detect small faces. The face detector was able to achieve an accuracy of over 89.6% using the Honda/UCSD dataset which runs at 26 FPS with darknet-53 to VGA-resolution images for classification tasks. The experimental results have demonstrated that the FaceNet+SVM model was able to achieve an accuracy of 99.7% using the LFW dataset. On the same dataset, FaceNet+KNN and FaceNet+RF achieve 99.5% and 85.1%, respectively; on the other hand, the FaceNet was able to achieve 99.6%. Finally, the proposed system provides a recognition accuracy of 99.1% and 49 ms runtime when both the face detection and classifications stages operate together.
RESUMO
At present, new data sharing technologies, such as those used in the Internet of Things (IoT) paradigm, are being extensively adopted. For this reason, intelligent security controls have become imperative. According to good practices and security information standards, particularly those regarding security in depth, several defensive layers are required to protect information assets. Within the context of IoT cyber-attacks, it is fundamental to continuously adapt new detection mechanisms for growing IoT threats, specifically for those becoming more sophisticated within mesh networks, such as identity theft and cloning. Therefore, current applications, such as Intrusion Detection Systems (IDS), Intrusion Prevention Systems (IPS), and Security Information and Event Management Systems (SIEM), are becoming inadequate for accurately handling novel security incidents, due to their signature-based detection procedures using the matching and flagging of anomalous patterns. This project focuses on a seldom-investigated identity attack-the Clone ID attack-directed at the Routing Protocol for Low Power and Lossy Networks (RPL), the underlying technology for most IoT devices. Hence, a robust Artificial Intelligence-based protection framework is proposed, in order to tackle major identity impersonation attacks, which classical applications are prone to misidentifying. On this basis, unsupervised pre-training techniques are employed to select key characteristics from RPL network samples. Then, a Dense Neural Network (DNN) is trained to maximize deep feature engineering, with the aim of improving classification results to protect against malicious counterfeiting attempts.
RESUMO
Hand gesture recognition (HGR) takes a central role in human-computer interaction, covering a wide range of applications in the automotive sector, consumer electronics, home automation, and others. In recent years, accurate and efficient deep learning models have been proposed for real-time applications. However, the most accurate approaches tend to employ multiple modalities derived from RGB input frames, such as optical flow. This practice limits real-time performance due to intense extra computational cost. In this paper, we avoid the optical flow computation by proposing a real-time hand gesture recognition method based on RGB frames combined with hand segmentation masks. We employ a light-weight semantic segmentation method (FASSD-Net) to boost the accuracy of two efficient HGR methods: Temporal Segment Networks (TSN) and Temporal Shift Modules (TSM). We demonstrate the efficiency of the proposal on our IPN Hand dataset, which includes thirteen different gestures focused on interaction with touchless screens. The experimental results show that our approach significantly overcomes the accuracy of the original TSN and TSM algorithms by keeping real-time performance.
Assuntos
Gestos , Reconhecimento Automatizado de Padrão , Algoritmos , Mãos , Humanos , Reconhecimento Psicológico , SemânticaRESUMO
The counting of vehicles plays an important role in measuring the behavior patterns of traffic flow in cities, as streets and avenues can get crowded easily. To address this problem, some Intelligent Transport Systems (ITSs) have been implemented in order to count vehicles with already established video surveillance infrastructure. With this in mind, in this paper, we present an on-line learning methodology for counting vehicles in video sequences based on Incremental Principal Component Analysis (Incremental PCA). This incremental learning method allows us to identify the maximum variability (i.e., motion detection) between a previous block of frames and the actual one by using only the first projected eigenvector. Once the projected image is obtained, we apply dynamic thresholding to perform image binarization. Then, a series of post-processing steps are applied to enhance the binary image containing the objects in motion. Finally, we count the number of vehicles by implementing a virtual detection line in each of the road lanes. These lines determine the instants where the vehicles pass completely through them. Results show that our proposed methodology is able to count vehicles with 96.6% accuracy at 26 frames per second on average-dealing with both camera jitter and sudden illumination changes caused by the environment and the camera auto exposure.
RESUMO
In recent years, online social media information has been the subject of study in several data science fields due to its impact on users as a communication and expression channel. Data gathered from online platforms such as Twitter has the potential to facilitate research over social phenomena based on sentiment analysis, which usually employs Natural Language Processing and Machine Learning techniques to interpret sentimental tendencies related to users’ opinions and make predictions about real events. Cyber-attacks are not isolated from opinion subjectivity on online social networks. Various security attacks are performed by hacker activists motivated by reactions from polemic social events. In this paper, a methodology for tracking social data that can trigger cyber-attacks is developed. Our main contribution lies in the monthly prediction of tweets with content related to security attacks and the incidents detected based on â 1 regularization.
RESUMO
This paper proposes a view-invariant gait recognition framework that employs a unique view invariant model that profits from the dimensionality reduction provided by Direct Linear Discriminant Analysis (DLDA). The framework, which employs gait energy images (GEIs), creates a single joint model that accurately classifies GEIs captured at different angles. Moreover, the proposed framework also helps to reduce the under-sampling problem (USP) that usually appears when the number of training samples is much smaller than the dimension of the feature space. Evaluation experiments compare the proposed framework's computational complexity and recognition accuracy against those of other view-invariant methods. Results show improvements in both computational complexity and recognition accuracy.