Pesquisa | Portal Regional da BVS

A Method for Reducing Training Time of ML-Based Cascade Scheme for Large-Volume Data Analysis.

Izonin, Ivan; Muzyka, Roman; Tkachenko, Roman; Dronyuk, Ivanna; Yemets, Kyrylo; Mitoulis, Stergios-Aristoteles.

Sensors (Basel) ; 24(15)2024 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-39123809

RESUMO

We live in the era of large data analysis, where processing vast datasets has become essential for uncovering valuable insights across various domains of our lives. Machine learning (ML) algorithms offer powerful tools for processing and analyzing this abundance of information. However, the considerable time and computational resources needed for training ML models pose significant challenges, especially within cascade schemes, due to the iterative nature of training algorithms, the complexity of feature extraction and transformation processes, and the large sizes of the datasets involved. This paper proposes a modification to the existing ML-based cascade scheme for analyzing large biomedical datasets by incorporating principal component analysis (PCA) at each level of the cascade. We selected the number of principal components to replace the initial inputs so that it ensured 95% variance retention. Furthermore, we enhanced the training and application algorithms and demonstrated the effectiveness of the modified cascade scheme through comparative analysis, which showcased a significant reduction in training time while improving the generalization properties of the method and the accuracy of the large data analysis. The improved enhanced generalization properties of the scheme stemmed from the reduction in nonsignificant independent attributes in the dataset, which further enhanced its performance in intelligent large data analysis.

An interpretable ensemble structure with a non-iterative training algorithm to improve the predictive accuracy of healthcare data analysis.

Izonin, Ivan; Tkachenko, Roman; Yemets, Kyrylo; Havryliuk, Myroslav.

Sci Rep ; 14(1): 12947, 2024 06 05.

Artigo em Inglês | MEDLINE | ID: mdl-38839889

RESUMO

The modern development of healthcare is characterized by a set of large volumes of tabular data for monitoring and diagnosing the patient's condition. In addition, modern methods of data engineering allow the synthesizing of a large number of features from an image or signals, which are presented in tabular form. The possibility of high-precision and high-speed processing of such large volumes of medical data requires the use of artificial intelligence tools. A linear machine learning model cannot accurately analyze such data, and traditional bagging, boosting, or stacking ensembles typically require significant computing power and time to implement. In this paper, the authors proposed a method for the analysis of large sets of medical data, based on a designed linear ensemble method with a non-iterative learning algorithm. The basic node of the new ensemble is an extended-input SGTM neural-like structure, which provides high-speed data processing at each level of the ensemble. Increasing prediction accuracy is ensured by dividing the large dataset into parts, the analysis of which is carried out in each node of the ensemble structure and taking into account the output signal from the previous level of the ensemble as an additional attribute on the next one. Such a design of a new ensemble structure provides both a significant increase in the prediction accuracy for large sets of medical data analysis and a significant reduction in the duration of the training procedure. Experimental studies on a large medical dataset, as well as a comparison with existing machine learning methods, confirmed the high efficiency of using the developed ensemble structure when solving the prediction task.

Assuntos

Algoritmos , Aprendizado de Máquina , Humanos , Análise de Dados , Atenção à Saúde , Inteligência Artificial , Redes Neurais de Computação

A non-linear SVR-based cascade model for improving prediction accuracy of biomedical data analysis.

Izonin, Ivan; Tkachenko, Roman; Gurbych, Olexander; Kovac, Michal; Rutkowski, Leszek; Holoven, Rostyslav.

Math Biosci Eng ; 20(7): 13398-13414, 2023 06 12.

Artigo em Inglês | MEDLINE | ID: mdl-37501493

RESUMO

Biomedical data analysis is essential in current diagnosis, treatment, and patient condition monitoring. The large volumes of data that characterize this area require simple but accurate and fast methods of intellectual analysis to improve the level of medical services. Existing machine learning (ML) methods require many resources (time, memory, energy) when processing large datasets. Or they demonstrate a level of accuracy that is insufficient for solving a specific application task. In this paper, we developed a new ensemble model of increased accuracy for solving approximation problems of large biomedical data sets. The model is based on cascading of the ML methods and response surface linearization principles. In addition, we used Ito decomposition as a means of nonlinearly expanding the inputs at each level of the model. As weak learners, Support Vector Regression (SVR) with linear kernel was used due to many significant advantages demonstrated by this method among the existing ones. The training and application procedures of the developed SVR-based cascade model are described, and a flow chart of its implementation is presented. The modeling was carried out on a real-world tabular set of biomedical data of a large volume. The task of predicting the heart rate of individuals was solved, which provides the possibility of determining the level of human stress, and is an essential indicator in various applied fields. The optimal parameters of the SVR-based cascade model operating were selected experimentally. The authors shown that the developed model provides more than 20 times higher accuracy (according to Mean Squared Error (MSE)), as well as a significant reduction in the duration of the training procedure compared to the existing method, which provided the highest accuracy of work among those considered.

Assuntos

Análise de Dados , Informática Médica , Máquina de Vetores de Suporte , Humanos

Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method.

Izonin, Ivan; Tkachenko, Roman; Dronyuk, Ivanna; Tkachenko, Pavlo; Gregus, Michal; Rashkevych, Mariia.

Math Biosci Eng ; 18(3): 2599-2613, 2021 03 17.

Artigo em Inglês | MEDLINE | ID: mdl-33892562

RESUMO

The paper considers the problem of handling short sets of medical data. Effectively solving this problem will provide the ability to solve numerous classification and regression tasks in case of limited data in health decision support systems. Many similar tasks arise in various fields of medicine. The authors improved the regression method of data analysis based on artificial neural networks by introducing additional elements into the formula for calculating the output signal of the existing RBF-based input-doubling method. This improvement provides averaging of the result, which is typical for ensemble methods, and allows compensating for the errors of different signs of the predicted values. These two advantages make it possible to significantly increase the accuracy of the methods of this class. It should be noted that the duration of the training algorithm of the advanced method remains the same as for existing method. Experimental modeling was performed using a real short medical data. The regression task in rheumatology was solved based on only 77 observations. The optimal parameters of the method, which provide the highest prediction accuracy based on MAE and RMSE, were selected experimentally. A comparison of its efficiency with other methods of this class has been performed. The highest accuracy of the proposed RBF-based additive input-doubling method among the considered ones is established. The method can be modified by using other nonlinear artificial intelligence tools to implement its training and application algorithms and such methods can be applied in various fields of medicine.

Assuntos

Inteligência Artificial , Medicina Clínica , Algoritmos , Redes Neurais de Computação

An Approach towards Increasing Prediction Accuracy for the Recovery of Missing IoT Data Based on the GRNN-SGTM Ensemble.

Tkachenko, Roman; Izonin, Ivan; Kryvinska, Natalia; Dronyuk, Ivanna; Zub, Khrystyna.

Sensors (Basel) ; 20(9)2020 May 04.

Artigo em Inglês | MEDLINE | ID: mdl-32375400

RESUMO

The purpose of this paper is to improve the accuracy of solving prediction tasks of the missing IoT data recovery. To achieve this, the authors have developed a new ensemble of neural network tools. It consists of two successive General Regression Neural Network (GRNN) networks and one neural-like structure of the Successive Geometric Transformation Model (SGTM). The principle of ensemble topology construction on two successively connected general regression neural networks, supplemented with an SGTM neural-like structure, is mathematically substantiated, which improves the accuracy of prediction results. The effectiveness of the method is based on the replacement of the summation of the results of the two GRNNs with a weighted summation, which improves the accuracy of the ensemble operation in general. A detailed algorithmic implementation of the ensemble method as well as a flowchart of its operation is presented. The parameters of the ensemble operation are determined by optimization using the brute-force method. Based on the developed ensemble method, the solution of the task of completing the partially missing values ââin the real monitoring dataset of the air environment collected by the IoT device is presented. By comparing the performance of the developed ensemble with the existing methods, the highest accuracy of its performance (by the parameters of Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) accuracy) among the most similar in this class has been proved.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA