RESUMO
Time series data often display complex, time-varying patterns, which pose significant challenges for effective classification due to data variability, noise, and imbalance. Traditional time series classification techniques frequently fall short in addressing these issues, leading to reduced generalization performance. Therefore, there is a need for innovative methodologies to enhance data diversity and quality. In this paper, we introduce a method for the extraction of features for time series classification using noise injection to address these challenges. By employing noise injection techniques for data augmentation, we enhance the diversity of the training data. Utilizing digital signal processing (DSP), we extract key frequency features from time series data through sampling, quantization, and Fourier transformation. This process enhances the quality of the training data, thereby maximizing the model's generalization performance. We demonstrate the superiority of our proposed method by comparing it with existing time series classification models. Additionally, we validate the effectiveness of our approach through various experimental results, confirming that data augmentation and DSP techniques are potent tools in time series data classification. Ultimately, this research presents a robust methodology for time series data analysis and classification, with potential applications across a broad spectrum of data analysis problems.
RESUMO
Deep neural networks must address the dual challenge of delivering high-accuracy predictions and providing user-friendly explanations. While deep models are widely used in the field of time series modeling, deciphering the core principles that govern the models' outputs remains a significant challenge. This is crucial for fostering the development of trusted models and facilitating domain expert validation, thereby empowering users and domain experts to utilize them confidently in high-risk decision-making contexts (e.g., decision-support systems in healthcare). In this work, we put forward a deep prototype learning model that supports interpretable and manipulable modeling and classification of medical time series (i.e., ECG signal). Specifically, we first optimize the representation of single heartbeat data by employing a bidirectional long short-term memory and attention mechanism, and then construct prototypes during the training phase. The final classification outcomes (i.e., normal sinus rhythm, atrial fibrillation, and other rhythm) are determined by comparing the input with the obtained prototypes. Moreover, the proposed model presents a human-machine collaboration mechanism, allowing domain experts to refine the prototypes by integrating their expertise to further enhance the model's performance (contrary to the human-in-the-loop paradigm, where humans primarily act as supervisors or correctors, intervening when required, our approach focuses on a human-machine collaboration, wherein both parties engage as partners, enabling more fluid and integrated interactions). The experimental outcomes presented herein delineate that, within the realm of binary classification tasks-specifically distinguishing between normal sinus rhythm and atrial fibrillation-our proposed model, albeit registering marginally lower performance in comparison to certain established baseline models such as Convolutional Neural Networks (CNNs) and bidirectional long short-term memory with attention mechanisms (Bi-LSTMAttns), evidently surpasses other contemporary state-of-the-art prototype baseline models. Moreover, it demonstrates significantly enhanced performance relative to these prototype baseline models in the context of triple classification tasks, which encompass normal sinus rhythm, atrial fibrillation, and other rhythm classifications. The proposed model manifests a commendable prediction accuracy of 0.8414, coupled with macro precision, recall, and F1-score metrics of 0.8449, 0.8224, and 0.8235, respectively, achieving both high classification accuracy as well as good interpretability.
Assuntos
Eletrocardiografia , Redes Neurais de Computação , Humanos , Eletrocardiografia/métodos , Fibrilação Atrial/fisiopatologia , Fibrilação Atrial/diagnóstico , Aprendizado Profundo , Frequência Cardíaca/fisiologia , Algoritmos , Processamento de Sinais Assistido por ComputadorRESUMO
The electrical energy supply relies on the satisfactory operation of insulators. The ultrasound recorded from insulators in different conditions has a time series output, which can be used to classify faulty insulators. The random convolutional kernel transform (Rocket) algorithms use convolutional filters to extract various features from the time series data. This paper proposes a combination of Rocket algorithms, machine learning classifiers, and empirical mode decomposition (EMD) methods, such as complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), empirical wavelet transform (EWT), and variational mode decomposition (VMD). The results show that the EMD methods, combined with MiniRocket, significantly improve the accuracy of logistic regression in insulator fault diagnosis. The proposed strategy achieves an accuracy of 0.992 using CEEMDAN, 0.995 with EWT, and 0.980 with VMD. These results highlight the potential of incorporating EMD methods in insulator failure detection models to enhance the safety and dependability of power systems.
RESUMO
The machines of WF Maschinenbau process metal blanks into various workpieces using so-called flow-forming processes. The quality of these workpieces depends largely on the quality of the blanks and the condition of the machine. This creates an urgent need for automated monitoring of the forming processes and the condition of the machine. Since the complexity of the flow-forming processes makes physical modeling impossible, the present work deals with data-driven modeling using machine learning algorithms. The main contributions of this work lie in showcasing the feasibility of utilizing machine learning and sensor data to monitor flow-forming processes, along with developing a practical approach for this purpose. The approach includes an experimental design capable of providing the necessary data, as well as a procedure for preprocessing the data and extracting features that capture the information needed by the machine learning models to detect defects in the blank and the machine. To make efficient use of the small number of experiments available, the experimental design is generated using Design of Experiments methods. They consist of two parts. In the first part, a pre-selection of influencing variables relevant to the forming process is performed. In the second part of the design, the selected variables are investigated in more detail. The preprocessing procedure consists of feature engineering, feature extraction and feature selection. In the feature engineering step, the data set is augmented with time series variables that are meaningful in the domain. For feature extraction, an algorithm was developed based on the mechanisms of the r-STSF, a state-of-the-art algorithm for time series classification, extending them for multivariate time series and metric target variables. This feature extraction algorithm itself can be seen as an additional contribution of this work, because it is not tied to the application domain of monitoring flow-forming processes, but can be used as a feature extraction algorithm for multivariate time series classification in general. For feature selection, a Recursive Feature Elimination is employed. With the resulting features, random forests are trained to detect several quality features of the blank and defects of the machine. The trained models achieve good prediction accuracy for most of the target variables. This shows that the application of machine learning is a promising approach for the monitoring of flow-forming processes, which requires further investigation for confirmation.
RESUMO
Slope Entropy (SlpEn) is a novel method recently proposed in the field of time series entropy estimation. In addition to the well-known embedded dimension parameter, m, used in other methods, it applies two additional thresholds, denoted as δ and γ, to derive a symbolic representation of a data subsequence. The original paper introducing SlpEn provided some guidelines for recommended specific values of these two parameters, which have been successfully followed in subsequent studies. However, a deeper understanding of the role of these thresholds is necessary to explore the potential for further SlpEn optimisations. Some works have already addressed the role of δ, but in this paper, we extend this investigation to include the role of γ and explore the impact of using an asymmetric scheme to select threshold values. We conduct a comparative analysis between the standard SlpEn method as initially proposed and an optimised version obtained through a grid search to maximise signal classification performance based on SlpEn. The results confirm that the optimised version achieves higher time series classification accuracy, albeit at the cost of significantly increased computational complexity.
RESUMO
The performance of time-series classification of electroencephalographic data varies strongly across experimental paradigms and study participants. Reasons are task-dependent differences in neuronal processing and seemingly random variations between subjects, amongst others. The effect of data pre-processing techniques to ameliorate these challenges is relatively little studied. Here, the influence of spatial filter optimization methods and non-linear data transformation on time-series classification performance is analyzed by the example of high-frequency somatosensory evoked responses. This is a model paradigm for the analysis of high-frequency electroencephalography data at a very low signal-to-noise ratio, which emphasizes the differences of the explored methods. For the utilized data, it was found that the individual signal-to-noise ratio explained up to 74% of the performance differences between subjects. While data pre-processing was shown to increase average time-series classification performance, it could not fully compensate the signal-to-noise ratio differences between the subjects. This study proposes an algorithm to prototype and benchmark pre-processing pipelines for a paradigm and data set at hand. Extreme learning machines, Random Forest, and Logistic Regression can be used quickly to compare a set of potentially suitable pipelines. For subsequent classification, however, machine learning models were shown to provide better accuracy.
Assuntos
Algoritmos , Eletroencefalografia , Humanos , Eletroencefalografia/métodos , Algoritmo Florestas Aleatórias , Extremidade Superior , Razão Sinal-Ruído , Processamento de Sinais Assistido por ComputadorRESUMO
In this study, we developed a machine learning framework to detect clinical mastitis (CM) at the current milking (i.e., the same milking) and predict CM at the next milking (i.e., one milking before CM occurrence) at the quarter level. Time series quarter-level milking data were extracted from an automated milking system (AMS). For both CM detection and prediction, the best classification performance was obtained from the decision tree-based ensemble models. Moreover, applying models on a data set containing data from the current milking and past 9 milkings before the current milking showed the best accuracy for detecting CM; modeling with a data set containing data from the current milking and past 7 milkings before the current milking yielded the best results for predicting CM. The models combined with oversampling methods resulted in specificity of 95 and 93% for CM detection and prediction, respectively, with the same sensitivity (82%) for both scenarios; when lowering specificity to 80 to 83%, undersampling techniques facilitated models to increase sensitivity to 95%. We propose a feasible machine learning framework to identify CM in a timely manner using imbalanced data from an AMS, which could provide useful information for farmers to manage the negative effects of CM.
Assuntos
Doenças dos Bovinos , Mastite Bovina , Bovinos , Feminino , Animais , Fatores de Tempo , Mastite Bovina/diagnóstico , Mastite Bovina/epidemiologia , Indústria de Laticínios/métodos , Leite , LactaçãoRESUMO
BACKGROUND: Large-scale medical equipment, which is extensively implemented in medical services, is of vital importance for diagnosis but vulnerable to various anomalies and failures. Most hospitals that conduct regular maintenance have been suffering from medical equipment-related incidents for years. Currently, the Internet of Medical Things (IoMT) has emerged as a crucial tool in monitoring the real-time status of the medical equipment. In this paper, we develop an IoMT system of Computed Tomography (CT) equipment in the West China Hospital, Sichuan University and collected the system status time-series data. Novel multivariate time-series classification models and frameworks are proposed to predict the anomalies of CT equipment. The important features that are closely related to the equipment anomalies are identified with the model. METHODS: We extracted the real-time CT equipment status time-series data of 11 equipment between May 19, 2020 and May 19, 2021 from the IoMT, which includes the equipment oil temperature, anode voltage, etc. The arcs are identified as labels of anomalies due to their relationship with decreased imaging quality and CT equipment failures. To improve prediction accuracy, the statistics and transformations of the raw historical time-series data segment in the sliding time window are used to construct new features. Due to the particularity of time-series data, two frameworks are proposed for splitting the training and test sets. Then the Decision Tree, Support Vector Machine, Logistic Regression, Naive Bayesian, and K-Nearest Neighbor classification models are used to classify the system status. We also compare our model to state-of-the-art models. RESULTS: The results show that the anomaly prediction accuracy and recall of our method are 79% and 77%, respectively. The oil temperature and anode voltage are identified as the decisive features that may lead to anomalies. The proposed model outperforms the others when predicting the anomalies of the CT equipment based on our dataset. CONCLUSIONS: The proposed method could predict the state of CT equipment and be used as a reference for practical maintenance, where unexpected anomalies of medical equipment could be reduced. It also brings new insights into how to handle non-uniform and imbalanced time series data in practical cases.
Assuntos
Tomografia Computadorizada por Raios X , Humanos , Teorema de Bayes , China , Análise por Conglomerados , EletrodosRESUMO
The rise in crime rates in many parts of the world, coupled with advancements in computer vision, has increased the need for automated crime detection services. To address this issue, we propose a new approach for detecting suspicious behavior as a means of preventing shoplifting. Existing methods are based on the use of convolutional neural networks that rely on extracting spatial features from pixel values. In contrast, our proposed method employs object detection based on YOLOv5 with Deep Sort to track people through a video, using the resulting bounding box coordinates as temporal features. The extracted temporal features are then modeled as a time-series classification problem. The proposed method was tested on the popular UCF Crime dataset, and benchmarked against the current state-of-the-art robust temporal feature magnitude (RTFM) method, which relies on the Inflated 3D ConvNet (I3D) preprocessing method. Our results demonstrate an impressive 8.45-fold increase in detection inference speed compared to the state-of-the-art RTFM, along with an F1 score of 92%,outperforming RTFM by 3%. Furthermore, our method achieved these results without requiring expensive data augmentation or image feature extraction.
Assuntos
Crime , Redes Neurais de Computação , Humanos , Crime/prevenção & controleRESUMO
Sensor-based human action recognition (HAR) is considered to have broad practical prospects. It applies to wearable devices to collect plantar pressure or acceleration information at human joints during human actions, thereby identifying human motion patterns. Existing related works have mainly focused on improving recognition accuracy, and have rarely considered energy-efficient management of portable HAR systems. Considering the high sensitivity and energy harvesting ability of triboelectric nanogenerators (TENGs), in this research a TENG which achieved output performance of 9.98 mW/cm2 was fabricated using polydimethylsiloxane and carbon nanotube film for sensor-based HAR as a wearable sensor. Considering real-time identification, data are acquired using a sliding window approach. However, the classification accuracy is challenged by quasi-periodic characteristics of the intercepted sequence. To solve this problem, compensatory dynamic time warping (C-DTW) is proposed, which adjusts the DTW result based on the proportion of points separated by small distances under DTW alignment. Our simulation results show that the classification accuracy of C-DTW is higher than that of DTW and its improved versions (e.g., WDTW, DDTW and softDTW), with almost the same complexity. Moreover, C-DTW is much faster than shapeDTW under the same classification accuracy. Without loss of generality, the performance of the existing DTW versions can be enhanced using the compensatory mechanism of C-DTW.
Assuntos
Aceleração , Reconhecimento Automatizado de Padrão , Humanos , Simulação por Computador , Atividades Humanas , Movimento (Física)RESUMO
Crop identification is one of the most important tasks in digital farming. The use of remote sensing data makes it possible to clarify the boundaries of fields and identify fallow land. This study considered the possibility of using the seasonal variation in the Dual-polarization Radar Vegetation Index (DpRVI), which was calculated based on data acquired by the Sentinel-1B satellite between May and October 2021, as the main characteristic. Radar images of the Khabarovskiy District of the Khabarovsk Territory, as well as those of the Arkharinskiy, Ivanovskiy, and Oktyabrskiy districts in the Amur Region (Russian Far East), were obtained and processed. The identifiable classes were soybean and oat crops, as well as fallow land. Classification was carried out using the Support Vector Machines, Quadratic Discriminant Analysis (QDA), and Random Forest (RF) algorithms. The training (848 ha) and test (364 ha) samples were located in Khabarovskiy District. The best overall accuracy on the test set (82.0%) was achieved using RF. Classification accuracy at the field level was 79%. When using the QDA classifier on cropland in the Amur Region (2324 ha), the overall classification accuracy was 83.1% (F1 was 0.86 for soybean, 0.84 for fallow, and 0.79 for oat). Application of the Radar Vegetation Index (RVI) and VV/VH ratio enabled an overall classification accuracy in the Amur region of 74.9% and 74.6%, respectively. Thus, using DpRVI allowed us to achieve greater performance compared to other SAR data, and it can be used to identify crops in the south of the Far East and serve as the basis for the automatic classification of cropland.
RESUMO
Video streaming service delivery is a challenging task for mobile network operators. Knowing which services clients are using could help ensure a specific quality of service and manage the users' experience. Additionally, mobile network operators could apply throttle, traffic prioritization, or differentiated pricing. However, due to the growth of encrypted Internet traffic, it has become difficult for network operators to recognize the type of service used by their clients. In this article, we propose and evaluate a method for recognizing video streams solely based on the shape of the bitstream on a cellular network communication channel. To classify bitstreams, we used a convolutional neural network that was trained on a dataset of download and upload bitstreams collected by the authors. We demonstrate that our proposed method achieves an accuracy of over 90% in recognizing video streams from real-world mobile network traffic data.
RESUMO
Structural health monitoring (SHM) has been extensively utilized in civil infrastructures for several decades. The status of civil constructions is monitored in real time using a wide variety of sensors; however, determining the true state of a structure can be difficult due to the presence of abnormalities in the acquired data. Extreme weather, faulty sensors, and structural damage are common causes of these abnormalities. For civil structure monitoring to be successful, abnormalities must be detected quickly. In addition, one form of abnormality generally predominates the SHM data, which might be a problem for civil infrastructure data. The current state of anomaly detection is severely hampered by this imbalance. Even cutting-edge damage diagnostic methods are useless without proper data-cleansing processes. In order to solve this problem, this study suggests a hyper-parameter-tuned convolutional neural network (CNN) for multiclass unbalanced anomaly detection. A multiclass time series of anomaly data from a real-world cable-stayed bridge is used to test the 1D CNN model, and the dataset is balanced by supplementing the data as necessary. An overall accuracy of 97.6% was achieved by balancing the database using data augmentation to enlarge the dataset, as shown in the research.
RESUMO
One possible device authentication method is based on device fingerprints, such as software- or hardware-based unique characteristics. In this paper, we propose a fingerprinting technique based on passive externally measured information, i.e., current consumption from the electrical network. The key insight is that small hardware discrepancies naturally exist even between same-electrical-circuit devices, making it feasible to identify slight variations in the consumed current under steady-state conditions. An experimental database of current consumption signals of two similar groups containing 20 same-model computer displays was collected. The resulting signals were classified using various state-of-the-art time-series classification (TSC) methods. We successfully identified 40 similar (same-model) electrical devices with about 94% precision, while most errors were concentrated in confusion between a small number of devices. A simplified empirical wavelet transform (EWT) paired with a linear discriminant analysis (LDA) classifier was shown to be the recommended classification method.
Assuntos
Eletricidade , Análise de OndaletasRESUMO
As the Internet-of-Things is deployed widely, many time-series data are generated everyday. Thus, classifying time-series automatically has become important. Compression-based pattern recognition has attracted attention, because it can analyze various data universally with few model parameters. RPCD (Recurrent Plots Compression Distance) is known as a compression-based time-series classification method. First, RPCD transforms time-series data into an image called "Recurrent Plots (RP)". Then, the distance between two time-series data is determined as the dissimilarity between their RPs. Here, the dissimilarity between two images is computed from the file size, when an MPEG-1 encoder compresses the video, which serializes the two images in order. In this paper, by analyzing the RPCD, we give an important insight that the quality parameter for the MPEG-1 encoding that controls the resolution of compressed videos influences the classification performance very much. We also show that the optimal parameter value depends extremely on the dataset to be classified: Interestingly, the optimal value for one dataset can make the RPCD fall behind a naive random classifier for another dataset. Supported by these insights, we propose an improved version of RPCD named qRPCD, which searches the optimal parameter value by means of cross-validation. Experimentally, qRPCD works superiorly to the original RPCD by about 4% in terms of classification accuracy.
RESUMO
BACKGROUND: COVID-19 caused more than 622 thousand deaths in Brazil. The infection can be asymptomatic and cause mild symptoms, but it also can evolve into a severe disease and lead to death. It is difficult to predict which patients will develop severe disease. There are, in the literature, machine learning models capable of assisting diagnose and predicting outcomes for several diseases, but usually these models require laboratory tests and/or imaging. METHODS: We conducted a observational cohort study that evaluated vital signs and measurements from patients who were admitted to Hospital das Clínicas (São Paulo, Brazil) between March 2020 and October 2021 due to COVID-19. The data was then represented as univariate and multivariate time series, that were used to train and test machine learning models capable of predicting a patient's outcome. RESULTS: Time series-based machine learning models are capable of predicting a COVID-19 patient's outcome with up to 96% general accuracy and 81% accuracy considering only the first hospitalization day. The models can reach up to 99% sensitivity (discharge prediction) and up to 91% specificity (death prediction). CONCLUSIONS: Results indicate that time series-based machine learning models combined with easily obtainable data can predict COVID-19 outcomes and support clinical decisions. With further research, these models can potentially help doctors diagnose other diseases.
Assuntos
COVID-19 , Brasil/epidemiologia , COVID-19/epidemiologia , Registros Eletrônicos de Saúde , Hospitalização , Humanos , Estudos Retrospectivos , Fatores de TempoRESUMO
Automatic Identification System (AIS) messages are useful for tracking vessel activity across oceans worldwide using radio links and satellite transceivers. Such data play a significant role in tracking vessel activity and mapping mobility patterns such as those found during fishing activities. Accordingly, this paper proposes a geometric-driven semi-supervised approach for fishing activity detection from AIS data. Through the proposed methodology, it is shown how to explore the information included in the messages to extract features describing the geometry of the vessel route. To this end, we leverage the unsupervised nature of cluster analysis to label the trajectory geometry, highlighting changes in the vessel's moving pattern, which tends to indicate fishing activity. The labels obtained by the proposed unsupervised approach are used to detect fishing activities, which we approach as a time-series classification task. We propose a solution using recurrent neural networks on AIS data streams with roughly 87% of the overall F-score on the whole trajectories of 50 different unseen fishing vessels. Such results are accompanied by a broad benchmark study assessing the performance of different Recurrent Neural Network (RNN) architectures. In conclusion, this work contributes by proposing a thorough process that includes data preparation, labeling, data modeling, and model validation. Therefore, we present a novel solution for mobility pattern detection that relies upon unfolding the geometry observed in the trajectory.
Assuntos
Caça , Redes Neurais de Computação , Análise por Conglomerados , Oceanos e MaresRESUMO
Background: Digital clinical measures collected via various digital sensing technologies such as smartphones, smartwatches, wearables, and ingestible and implantable sensors are increasingly used by individuals and clinicians to capture the health outcomes or behavioral and physiological characteristics of individuals. Time series classification (TSC) is very commonly used for modeling digital clinical measures. While deep learning models for TSC are very common and powerful, there exist some fundamental challenges. This review presents the non-deep learning models that are commonly used for time series classification in biomedical applications that can achieve high performance. Objective: We performed a systematic review to characterize the techniques that are used in time series classification of digital clinical measures throughout all the stages of data processing and model building. Methods: We conducted a literature search on PubMed, as well as the Institute of Electrical and Electronics Engineers (IEEE), Web of Science, and SCOPUS databases using a range of search terms to retrieve peer-reviewed articles that report on the academic research about digital clinical measures from a five-year period between June 2016 and June 2021. We identified and categorized the research studies based on the types of classification algorithms and sensor input types. Results: We found 452 papers in total from four different databases: PubMed, IEEE, Web of Science Database, and SCOPUS. After removing duplicates and irrelevant papers, 135 articles remained for detailed review and data extraction. Among these, engineered features using time series methods that were subsequently fed into widely used machine learning classifiers were the most commonly used technique, and also most frequently achieved the best performance metrics (77 out of 135 articles). Statistical modeling (24 out of 135 articles) algorithms were the second most common and also the second-best classification technique. Conclusions: In this review paper, summaries of the time series classification models and interpretation methods for biomedical applications are summarized and categorized. While high time series classification performance has been achieved in digital clinical, physiological, or biomedical measures, no standard benchmark datasets, modeling methods, or reporting methodology exist. There is no single widely used method for time series model development or feature interpretation, however many different methods have proven successful.
Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Smartphone , Fatores de TempoRESUMO
The popularity of action recognition (AR) approaches and the need for improvement of their effectiveness require the generation of artificial samples addressing the nonlinearity of the time-space, scarcity of data points, or their variability. Therefore, in this paper, a novel approach to time series augmentation is proposed. The method improves the suboptimal warped time series generator algorithm (SPAWNER), introducing constraints based on identified AR-related problems with generated data points. Specifically, the proposed ARSPAWNER removes potential new time series that do not offer additional knowledge to the examples of a class or are created far from the occupied area. The constraints are based on statistics of time series of AR classes and their representative examples inferred with dynamic time warping barycentric averaging technique (DBA). The extensive experiments performed on eight AR datasets using three popular time series classifiers reveal the superiority of the introduced method over related approaches.
Assuntos
Algoritmos , Atividades Humanas , HumanosRESUMO
The reliable assessment of muscle states, such as contracted muscles vs. non-contracted muscles or relaxed muscles vs. fatigue muscles, is crucial in many sports and rehabilitation scenarios, such as the assessment of therapeutic measures. The goal of this work was to deploy machine learning (ML) models based on one-dimensional (1-D) sonomyography (SMG) signals to facilitate low-cost and wearable ultrasound devices. One-dimensional SMG is a non-invasive technique using 1-D ultrasound radio-frequency signals to measure muscle states and has the advantage of being able to acquire information from deep soft tissue layers. To mimic real-life scenarios, we did not emphasize the acquisition of particularly distinct signals. The ML models exploited muscle contraction signals of eight volunteers and muscle fatigue signals of 21 volunteers. We evaluated them with different schemes on a variety of data types, such as unprocessed or processed raw signals and found that comparatively simple ML models, such as Support Vector Machines or Logistic Regression, yielded the best performance w.r.t. accuracy and evaluation time. We conclude that our framework for muscle contraction and muscle fatigue classifications is very well-suited to facilitate low-cost and wearable devices based on ML models using 1-D SMG.