RESUMO
System logs are a crucial component of system maintainability, as they record the status of the system and essential events for troubleshooting and maintenance when necessary. Therefore, anomaly detection of system logs is crucial. Recent research has focused on extracting semantic information from unstructured log messages for log anomaly detection tasks. Since BERT models work well in natural language processing, this paper proposes an approach called CLDTLog, which introduces contrastive learning and dual-objective tasks in a BERT pre-trained model and performs anomaly detection on system logs through a fully connected layer. This approach does not require log parsing and thus can avoid the uncertainty caused by log parsing. We trained the CLDTLog model on two log datasets (HDFS and BGL) and achieved F1 scores of 0.9971 and 0.9999 on the HDFS and BGL datasets, respectively, which performed better than all known methods. In addition, when using only 1% of the BGL dataset as training data, CLDTLog still achieves an F1 score of 0.9993, showing excellent generalization performance with a significant reduction of the training cost.
Assuntos
Aprendizagem , Processamento de Linguagem Natural , Semântica , IncertezaRESUMO
Traffic classification is the first step in network anomaly detection and is essential to network security. However, existing malicious traffic classification methods have several limitations; for example, statistical-based methods are vulnerable to hand-designed features, and deep learning-based methods are vulnerable to the balance and adequacy of data sets. In addition, the existing BERT-based malicious traffic classification methods only focus on the global features of traffic and ignore the time-series features of traffic. To address these problems, we propose a BERT-based Time-Series Feature Network (TSFN) model in this paper. The first is a Packet encoder module built by the BERT model, which completes the capture of global features of the traffic using the attention mechanism. The second is a temporal feature extraction module built by the LSTM model, which captures the time-series features of the traffic. Then, the global and time-series features of the malicious traffic are incorporated together as the final feature representation, which can better represent the malicious traffic. The experimental results show that the proposed approach can effectively improve the accuracy of malicious traffic classification on the publicly available USTC-TFC dataset, reaching an F1 value of 99.50%. This shows that the time-series features in malicious traffic can help improve the accuracy of malicious traffic classification.
RESUMO
Universities contribute to economic growth and national competitiveness by equipping students with higher-order thinking and academic skills. Despite large investments in university science, technology, engineering and mathematics (STEM) education, little is known about how the skills of STEM undergraduates compare across countries and by institutional selectivity. Here, we provide direct evidence on these issues by collecting and analysing longitudinal data on tens of thousands of computer science and electrical engineering students in China, India, Russia and the United States. We find stark differences in skill levels and gains among countries and by institutional selectivity. Compared with the United States, students in China, India and Russia do not gain critical thinking skills over four years. Furthermore, while students in India and Russia gain academic skills during the first two years, students in China do not. These gaps in skill levels and gains provide insights into the global competitiveness of STEM university students across nations and institutional types.