Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36464485

RESUMO

Due to the increasing importance of graphs and graph streams in data representation in today's era, concept drift detection in graph streaming scenarios is more important than ever. Contributions to concept drift detection in graph streams are minimal and practically non-existent in the field of toxicology. This paper applied the discriminative subgraph-based drift detector (DSDD) to graph streams generated from real-world toxicology datasets. We used four toxicology datasets, each of which yielded two graph streams - one with abrupt drift points and one with gradual drift points. We used DSDD both with the standard minimum description length (MDL) heuristic and after replacing MDL with a much simpler heuristic SIZE (number of vertices + number of edges), and applied it to all generated graph streams containing abrupt drift points and gradual drift points for varying window sizes. Following that, we compared and analyzed the results. Finally, we applied a long short-term memory based graph stream classification model to all the generated streams and compared the difference in the performances obtained with and without detecting drift using DSDD. We believe that the results and analysis presented in this paper will provide insight into the task of concept drift detection in the toxicology domain and aid in the application of DSDD in a variety of scenarios.

2.
BMC Med Inform Decis Mak ; 24(1): 34, 2024 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-38308256

RESUMO

BACKGROUND: Concept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift. METHODS: We trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014-2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (Rahmani K, et al, Int J Med Inform 173:104930, 2023) we weighted older data weaker, (Morger A, et al, Sci Rep 12:7244, 2022) used only the most recent data for model training and (Dilmegani C, 2023) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features. RESULTS: The models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters. CONCLUSIONS: Our results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.


Assuntos
COVID-19 , Pandemias , Humanos , COVID-19/epidemiologia , Algoritmos , Mortalidade Hospitalar , Aprendizado de Máquina
3.
Sensors (Basel) ; 24(7)2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38610294

RESUMO

The rapid development of the Internet of Things (IoT) has brought many conveniences to our daily life. However, it has also introduced various security risks that need to be addressed. The proliferation of IoT botnets is one of these risks. Most of researchers have had some success in IoT botnet detection using artificial intelligence (AI). However, they have not considered the impact of dynamic network data streams on the models in real-world environments. Over time, existing detection models struggle to cope with evolving botnets. To address this challenge, we propose an incremental learning approach based on Gradient Boosting Decision Trees (GBDT), called GBDT-IL, for detecting botnet traffic in IoT environments. It improves the robustness of the framework by adapting to dynamic IoT data using incremental learning. Additionally, it incorporates an enhanced Fisher Score feature selection algorithm, which enables the model to achieve a high accuracy even with a smaller set of optimal features, thereby reducing the system resources required for model training. To evaluate the effectiveness of our approach, we conducted experiments on the BoT-IoT, N-BaIoT, MedBIoT, and MQTTSet datasets. We compared our method with similar feature selection algorithms and existing concept drift detection algorithms. The experimental results demonstrated that our method achieved an average accuracy of 99.81% using only 25 features, outperforming similar feature selection algorithms. Furthermore, our method achieved an average accuracy of 96.88% in the presence of different types of drifting data, which is 2.98% higher than the best available concept drift detection algorithms, while maintaining a low average false positive rate of 3.02%.

4.
Sensors (Basel) ; 24(9)2024 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-38732892

RESUMO

Future air quality monitoring networks will integrate fleets of low-cost gas and particulate matter sensors that are calibrated using machine learning techniques. Unfortunately, it is well known that concept drift is one of the primary causes of data quality loss in machine learning application operational scenarios. The present study focuses on addressing the calibration model update of low-cost NO2 sensors once they are triggered by a concept drift detector. It also defines which data are the most appropriate to use in the model updating process to gain compliance with the relative expanded uncertainty (REU) limits established by the European Directive. As the examined methodologies, the general/global and the importance weighting calibration models were applied for concept drift effects mitigation. Overall, for all the devices under test, the experimental results show the inadequacy of both models when performed independently. On the other hand, the results from the application of both models through a stacking ensemble strategy were able to extend the temporal validity of the used calibration model by three weeks at least for all the sensor devices under test. Thus, the usefulness of the whole information content gathered throughout the original co-location process was maximized.

5.
Sensors (Basel) ; 23(6)2023 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-36991969

RESUMO

Industrial collaborative robots (cobots) are known for their ability to operate in dynamic environments to perform many different tasks (since they can be easily reprogrammed). Due to their features, they are largely used in flexible manufacturing processes. Since fault diagnosis methods are generally applied to systems where the working conditions are bounded, problems arise when defining condition monitoring architecture, in terms of setting absolute criteria for fault analysis and interpreting the meanings of detected values since working conditions may vary. The same cobot can be easily programmed to accomplish more than three or four tasks in a single working day. The extreme versatility of their use complicates the definition of strategies for detecting abnormal behavior. This is because any variation in working conditions can result in a different distribution of the acquired data stream. This phenomenon can be viewed as concept drift (CD). CD is defined as the change in data distribution that occurs in dynamically changing and nonstationary systems. Therefore, in this work, we propose an unsupervised anomaly detection (UAD) method that is capable of operating under CD. This solution aims to identify data changes coming from different working conditions (the concept drift) or a system degradation (failure) and, at the same time, can distinguish between the two cases. Additionally, once a concept drift is detected, the model can be adapted to the new conditions, thereby avoiding misinterpretation of the data. This paper concludes with a proof of concept (POC) that tests the proposed method on an industrial collaborative robot.

6.
Sensors (Basel) ; 23(23)2023 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-38067800

RESUMO

With the development of intelligent IoT applications, vast amounts of data are generated by various volume sensors. These sensor data need to be reduced at the sensor and then reconstructed later to save bandwidth and energy. As the reduced data increase, the reconstructed data become less accurate. Usually, the trade-off between reduction rate and reconstruction accuracy is controlled by the reduction threshold, which is calculated by experiments based on historical data. Considering the dynamic nature of IoT, a fixed threshold cannot balance the reduction rate with the reconstruction accuracy adaptively. Aiming to dynamically balance the reduction rate with the reconstruction accuracy, an autonomous IoT data reduction method based on an adaptive threshold is proposed. During data reduction, concept drift detection is performed to capture IoT dynamic changes and trigger threshold adjustment. During data reconstruction, a data trend is added to improve reconstruction accuracy. The effectiveness of the proposed method is demonstrated by comparing the proposed method with the basic Kalman filtering algorithm, LMS algorithm, and PIP algorithm on stationary and nonstationary datasets. Compared with not applying the adaptive threshold, on average, there is an 11.7% improvement in accuracy for the same reduction rate or a 17.3% improvement in reduction rate for the same accuracy.

7.
Sensors (Basel) ; 23(7)2023 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-37050795

RESUMO

Concept drift (CD) in data streaming scenarios such as networking intrusion detection systems (IDS) refers to the change in the statistical distribution of the data over time. There are five principal variants related to CD: incremental, gradual, recurrent, sudden, and blip. Genetic programming combiner (GPC) classification is an effective core candidate for data stream classification for IDS. However, its basic structure relies on the usage of traditional static machine learning models that receive onetime training, limiting its ability to handle CD. To address this issue, we propose an extended variant of the GPC using three main components. First, we replace existing classifiers with alternatives: online sequential extreme learning machine (OSELM), feature adaptive OSELM (FA-OSELM), and knowledge preservation OSELM (KP-OSELM). Second, we add two new components to the GPC, specifically, a data balancing and a classifier update. Third, the coordination between the sub-models produces three novel variants of the GPC: GPC-KOS for KA-OSELM; GPC-FOS for FA-OSELM; and GPC-OS for OSELM. This article presents the first data stream-based classification framework that provides novel strategies for handling CD variants. The experimental results demonstrate that both GPC-KOS and GPC-FOS outperform the traditional GPC and other state-of-the-art methods, and the transfer learning and memory features contribute to the effective handling of most types of CD. Moreover, the application of our incremental variants on real-world datasets (KDD Cup '99, CICIDS-2017, CSE-CIC-IDS-2018, and ISCX '12) demonstrate improved performance (GPC-FOS in connection with CSE-CIC-IDS-2018 and CICIDS-2017; GPC-KOS in connection with ISCX2012 and KDD Cup '99), with maximum accuracy rates of 100% and 98% by GPC-KOS and GPC-FOS, respectively. Additionally, our GPC variants do not show superior performance in handling blip drift.

8.
Sensors (Basel) ; 23(16)2023 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-37631632

RESUMO

This paper addresses the growing demand for healthcare systems, particularly among the elderly population. The need for these systems arises from the desire to enable patients and seniors to live independently in their homes without relying heavily on their families or caretakers. To achieve substantial improvements in healthcare, it is essential to ensure the continuous development and availability of information technologies tailored explicitly for patients and elderly individuals. The primary objective of this study is to comprehensively review the latest remote health monitoring systems, with a specific focus on those designed for older adults. To facilitate a comprehensive understanding, we categorize these remote monitoring systems and provide an overview of their general architectures. Additionally, we emphasize the standards utilized in their development and highlight the challenges encountered throughout the developmental processes. Moreover, this paper identifies several potential areas for future research, which promise further advancements in remote health monitoring systems. Addressing these research gaps can drive progress and innovation, ultimately enhancing the quality of healthcare services available to elderly individuals. This, in turn, empowers them to lead more independent and fulfilling lives while enjoying the comforts and familiarity of their own homes. By acknowledging the importance of healthcare systems for the elderly and recognizing the role of information technologies, we can address the evolving needs of this population. Through ongoing research and development, we can continue to enhance remote health monitoring systems, ensuring they remain effective, efficient, and responsive to the unique requirements of elderly individuals.


Assuntos
Lacunas de Evidências , Tecnologia da Informação , Humanos , Idoso , Reconhecimento Psicológico
9.
Knowl Inf Syst ; 64(5): 1385-1416, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35340819

RESUMO

Existing well-investigated Predictive Process Monitoring techniques typically construct a predictive model based on past process executions and then use this model to predict the future of new ongoing cases, without the possibility of updating it with new cases when they complete their execution. This can make Predictive Process Monitoring too rigid to deal with the variability of processes working in real environments that continuously evolve and/or exhibit new variant behaviours over time. As a solution to this problem, we evaluate the use of three different strategies that allow the periodic rediscovery or incremental construction of the predictive model so as to exploit new available data. The evaluation focuses on the performance of the new learned predictive models, in terms of accuracy and time, against the original one, and uses a number of real and synthetic datasets with and without explicit Concept Drift. The results provide an evidence of the potential of incremental learning algorithms for predicting process monitoring in real environments.

10.
Entropy (Basel) ; 24(7)2022 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-35885132

RESUMO

This paper presents a set of methods, jointly called PGraphD*, which includes two new methods (PGraphDD-QM and PGraphDD-SS) for drift detection and one new method (PGraphDL) for drift localisation in business processes. The methods are based on deep learning and graphs, with PGraphDD-QM and PGraphDD-SS employing a quality metric and a similarity score for detecting drifts, respectively. According to experimental results, PGraphDD-SS outperforms PGraphDD-QM in drift detection, achieving an accuracy score of 100% over the majority of synthetic logs and an accuracy score of 80% over a complex real-life log. Furthermore, PGraphDD-SS detects drifts with delays that are 59% shorter on average compared to the best performing state-of-the-art method.

11.
J Biomed Inform ; 121: 103862, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34229062

RESUMO

It has not been long since a new disease called COVID-19 has hit the international community. Unknown nature of the virus, evidence of its adaptability and survival in new conditions, its widespread prevalence and also lengthy recovery period, along with daily notifications of new infection and fatality statistics, have created a wave of fear and anxiety among the public community and authorities. These factors have led to extreme changes in the social discourse in a rather short period of time. The analysis of this discourse is important to reconcile the society and restore ordinary conditions of mental peace and health. Although much research has been done on the disease since its international pandemic, the sociological analysis of the recent public phenomenon, especially in developing countries, still needs attention. We propose a framework for analyzing social media data and news stories oriented around COVID-19 disease. Our research is based on an extensive Persian data set gathered from different social media networks and news agencies in the period of January 21-April 29, 2020. We use the Latent Dirichlet Allocation (LDA) model and dynamic topic modeling to understand and capture the change of discourse in terms of temporal subjects. We scrutinize the reasons of subject alternations by exploring the related events and adopted practices and policies. The social discourse can highly affect the community morale and polarization. Therefore, we further analyze the polarization in online social media posts, and detect points of concept drift in the stream. Based on the analyzed content, effective guidelines are extracted to shift polarization towards positive. The results show that the proposed framework is able to provide an effective practical approach for cause and effect analysis of the social discourse.


Assuntos
COVID-19 , Mídias Sociais , Humanos , Irã (Geográfico)/epidemiologia , Pandemias , SARS-CoV-2
12.
Sensors (Basel) ; 21(8)2021 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-33920950

RESUMO

This review presents the state of the art and a global overview of research challenges of real-time distributed activity recognition in the field of healthcare. Offline activity recognition is discussed as a starting point to establish the useful concepts of the field, such as sensor types, activity labeling and feature extraction, outlier detection, and machine learning. New challenges and obstacles brought on by real-time centralized activity recognition such as communication, real-time activity labeling, cloud and local approaches, and real-time machine learning in a streaming context are then discussed. Finally, real-time distributed activity recognition is covered through existing implementations in the scientific literature, and six main angles of optimization are defined: Processing, memory, communication, energy, time, and accuracy. This survey is addressed to any reader interested in the development of distributed artificial intelligence as well activity recognition, regardless of their level of expertise.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Atenção à Saúde
13.
Entropy (Basel) ; 23(7)2021 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-34356400

RESUMO

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data's underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances' continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms' efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

14.
Entropy (Basel) ; 23(3)2021 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-33807028

RESUMO

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework-a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.

15.
Sensors (Basel) ; 20(7)2020 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-32283841

RESUMO

With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.


Assuntos
Atenção à Saúde , Internet das Coisas , Telemedicina , Algoritmos , Mineração de Dados
16.
Sensors (Basel) ; 20(20)2020 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-33066579

RESUMO

In the modern era of digitization, the analysis in the Internet of Things (IoT) environment demands a brisk amalgamation of domains such as high-dimension (images) data sensing technologies, robust internet connection (4 G or 5 G) and dynamic (adaptive) deep learning approaches. This is required for a broad range of indispensable intelligent applications, like intelligent healthcare systems. Dynamic image classification is one of the major areas of concern for researchers, which may take place during analysis under the IoT environment. Dynamic image classification is associated with several temporal data perturbations (such as novel class arrival and class evolution issue) which cause a massive classification deterioration in the deployed classification models and make them in-effective. Therefore, this study addresses such temporal inconsistencies (novel class arrival and class evolution issue) and proposes an adapted deep learning framework (ameliorated adaptive convolutional neural network (CNN) ensemble framework), which handles novel class arrival and class evaluation issue during dynamic image classification. The proposed framework is an improved version of previous adaptive CNN ensemble with an additional online training (OT) and online classifier update (OCU) modules. An OT module is a clustering-based approach which uses the Euclidean distance and silhouette method to determine the potential new classes, whereas, the OCU updates the weights of the existing instances of the ensemble with newly arrived samples. The proposed framework showed the desirable classification improvement under non-stationary scenarios for the benchmark (CIFAR10) and real (ISIC 2019: Skin disease) data streams. Also, the proposed framework outperformed against state-of-art shallow learning and deep learning models. The results have shown the effectiveness and proven the diversity of the proposed framework to adapt the new concept changes during dynamic image classification. In future work, the authors of this study aim to develop an IoT-enabled adaptive intelligent dermoscopy device (for dermatologists). Therefore, further improvements in classification accuracy (for real dataset) is the future concern of this study.

17.
J Biomed Inform ; 89: 1-10, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30468912

RESUMO

OBJECTIVES: Finding recent clinical studies that warrant changes in clinical practice ("high impact" clinical studies) in a timely manner is very challenging. We investigated a machine learning approach to find recent studies with high clinical impact to support clinical decision making and literature surveillance. METHODS: To identify recent studies, we developed our classification model using time-agnostic features that are available as soon as an article is indexed in PubMed®, such as journal impact factor, author count, and study sample size. Using a gold standard of 541 high impact treatment studies referenced in 11 disease management guidelines, we tested the following null hypotheses: (1) the high impact classifier with time-agnostic features (HI-TA) performs equivalently to PubMed's Best Match sort and a MeSH-based Naïve Bayes classifier; and (2) HI-TA performs equivalently to the high impact classifier with both time-agnostic and time-sensitive features (HI-TS) enabled in a previous study. The primary outcome for both hypotheses was mean top 20 precision. RESULTS: The differences in mean top 20 precision between HI-TA and three baselines (PubMed's Best Match, a MeSH-based Naïve Bayes classifier, and HI-TS) were not statistically significant (12% vs. 3%, p = 0.101; 12% vs. 11%, p = 0.720; 12% vs. 25%, p = 0.094, respectively). Recall of HI-TA was low (7%). CONCLUSION: HI-TA had equivalent performance to state-of-the-art approaches that depend on time-sensitive features. With the advantage of relying only on time-agnostic features, the proposed approach can be used as an adjunct to help clinicians identify recent high impact clinical studies to support clinical decision-making. However, low recall limits the use of HI-TA for literature surveillance.


Assuntos
Tomada de Decisão Clínica , Aprendizado de Máquina , PubMed , Publicações/classificação , Teorema de Bayes
18.
Sensors (Basel) ; 19(14)2019 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-31330790

RESUMO

Real-time and long-term behavioural monitoring systems in precision livestock farming have huge potential to improve welfare and productivity for the better health of farm animals. However, some of the biggest challenges for long-term monitoring systems relate to "concept drift", which occurs when systems are presented with challenging new or changing conditions, and/or in scenarios where training data is not accurately reflective of live sensed data. This study presents a combined offline algorithm and online learning algorithm which deals with concept drift and is deemed by the authors as a useful mechanism for long-term in-the-field monitoring systems. The proposed algorithm classifies three relevant sheep behaviours using information from an embedded edge device that includes tri-axial accelerometer and tri-axial gyroscope sensors. The proposed approach is for the first time reported in precision livestock behavior monitoring and demonstrates improvement in classifying relevant behaviour in sheep, in real-time, under dynamically changing conditions.


Assuntos
Agricultura , Comportamento Animal/fisiologia , Gado , Ovinos/fisiologia , Algoritmos , Animais , Meio Ambiente
19.
Entropy (Basel) ; 21(1)2019 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-33266741

RESUMO

In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or slow to adapt to these changes. Ensemble-based systems are widely known for their good results predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF (Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest for evolving data streams, adding on top a mechanism to store and handle a shared collection of inactive trees, called concept history, which holds memories of the way market operators reacted in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by replacing active trees with the best available alternative: either a previously stored tree from the concept history or a newly trained background tree. Both mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The experimental validation of the algorithm is based on the prediction of price movement directions one second ahead in the SPDR (Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked against other popular methods from the incremental online machine learning literature and is able to achieve competitive results.

20.
Entropy (Basel) ; 20(10)2018 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-33265863

RESUMO

We introduce a modeling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e., the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA