Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Data Brief ; 54: 110389, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38646194

RESUMO

Using a user DNS fingerprint allows one to identify a specific network user regardless of the knowledge of his IP address. This method is proper, for example, when examining the behavior of a monitored network user in more depth. In contrast to other studies, this work introduces a dataset for possible user identification based only on the knowledge of its DNS fingerprint created from the previously sent DNS queries. We created a large dataset from the real network traffic of a metropolitan Internet service provider. The dataset was created from 2.3 billion DNS queries representing 6.2 million different domain names. The data collection took place over three months from 12/2023 to 02/2024. The dataset contains a detailed user activity description in the sense of overall daily activity statistics and detailed 24 h activity statistics. Each dataset record contains a list of 1137 classification attributes. The absolutely unique feature of this data set is the classification of user activity based on categories of content accessed by a user. The new dataset can be used for the creation of machine learning models, allowing the identification of a specific user without direct knowledge of their IP addresses or additional network location information. The dataset can also serve as a reference dataset for the creation of DNS fingerprints of users.

2.
Data Brief ; 54: 110522, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38827251

RESUMO

In this paper, we would like to introduce a unique dataset that covers thousands of network flow measurements realized through TCP in a data center environment. The TCP protocol is widely used for reliable data transfers and has many different versions. The various versions of TCP are specific in how they deal with link congestion through the congestion control algorithm (CCA). Our dataset represents a unique, comprehensive comparison of the 17 currently used versions of TCP with different CCAs. Each TCP flow was measured precisely 50 times to eliminate the measurement instability. The comparison of the various TCP versions is based on the knowledge of 18 quantitative attributes representing the parameters of a TCP transmission. Our dataset is suitable for testing and comparing different versions of TCP, creating new CCAs based on machine learning models, or creating and testing machine learning models, allowing the identification and optimization of the currently existing versions of TCP.

3.
Data Brief ; 47: 108945, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36798601

RESUMO

Encryption of network traffic should guarantee anonymity and prevent potential interception of information. Encrypted virtual private networks (VPNs) are designed to create special data tunnels that allow reliable transmission between networks and/or end users. However, as has been shown in a number of scientific papers, encryption alone may not be sufficient to secure data transmissions in the sense that certain information may be exposed. Our team has constructed a large dataset that contains generated encrypted network traffic data. This dataset contains a general network traffic model consisting of different types of network traffic such as web, emailing, video conferencing, video streaming, and terminal services. For the same network traffic model, data are measured for different scenarios, i.e., for data traffic through different types of VPNs and without VPNs. Additionally, the dataset contains the initial handshake of the VPN connections. The dataset can be used by various data scientists dealing with the classification of encrypted network traffic and encrypted VPNs.

4.
Data Brief ; 49: 109335, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37456120

RESUMO

Most of the video content on the Internet today is distributed through online streaming platforms. To ensure user privacy, data transmissions are often encrypted using cryptographic protocols. In previous research, we first experimentally validated the idea that the amount of transmitted data belonging to a particular video stream is not constant over time or that it changes periodically and forms a specific fingerprint. Based on the knowledge of the fingerprint of a specific video stream, this video stream can be subsequently identified. Over several months of intensive work, our team has created a large dataset containing a large number of video streams that were captured by network traffic probes during their playback by end users. The video streams were deliberately chosen to fall thematically into pre-selected categories. We selected two primary platforms for streaming - PeerTube and YouTube The first platform was chosen because of the possibility of modifying any streaming parameters, while the second one was chosen because it is used by many people worldwide. Our dataset can be used to create and train machine learning models or heuristic algorithms, allowing encrypted video stream identification according to their content resp. type category or specifically.

5.
Data Brief ; 48: 109137, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37128589

RESUMO

Christian religious monuments as cathedrals, chapels, and temples, are found in many places on our planet. World-famous buildings such as the Notre Dame Cathedral in Paris, Gaudi's Cathedral in Barcelona, and St. Vitus Cathedral in Prague are commonly known. Many online photographs can be used to build machine-learning models to identify them. The number of photographs is already significantly lower for little-known buildings, such as small churches in the Czech-German border region, and similar approaches cannot be used for identification. Based on these facts, our team has compiled a unique dataset for identifying the most important elements of Christian sacral buildings as altars, frescoes, pulpits, etc., which are almost always found in them. Our data set was manually created from several thousand real photographs. This dataset seems to be very usable, e.g., for creating new machine learning models and identifying objects in sacred objects or the objects themselves.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA