Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 48
1.
PLoS One ; 19(1): e0295036, 2024.
Article En | MEDLINE | ID: mdl-38206967

The wheat crop that fulfills 35% of human food demand is facing several problems due to a lack of transparency, security, reliability, and traceability in the existing agriculture supply chain. Many systems have been developed for the agriculture supply chain to overcome such issues, however, monopolistic centralized control is the biggest hurdle to realizing the use of such systems. It has eventually gained consumers' trust in branded products and rejected other products due to the lack of traceable supply chain information. This study proposes a blockchain-based framework for supply chain traceability which provides trustable, transparent, secure, and reliable services for the wheat crop. A crypto token called wheat coin (WC) has been introduced to keep track of transactions among the stakeholders of the wheat supply chain. Moreover, an initial coin offering (ICO) of WC, crypto wallets, and an economic model are proposed. Furthermore, a smart contract-based transaction system has been devised for the transparency of wheat crop transactions and conversion of WC to fiat and vice versa. We have developed the interplanetary file system (IPFS) to improve data availability, security, and transparency which stores encrypted private data of farmers, businesses, and merchants. Lastly, the results of the experiments show that the proposed framework shows better performance as compared to previous crop supply chain solutions in terms of latency to add-blocks, per-minute transactions, average gas charge for the transaction, and transaction verification time. Performance analysis with Bitcoin and Ethereum shows the superior performance of the proposed system.


Blockchain , Cryptococcus neoformans , Cryptosporidiosis , Humans , Triticum , Reproducibility of Results , Agriculture , Commerce
2.
PeerJ Comput Sci ; 10: e1697, 2024.
Article En | MEDLINE | ID: mdl-38259896

Public concern regarding health systems has experienced a rapid surge during the last two years due to the COVID-19 outbreak. Accordingly, medical professionals and health-related institutions reach out to patients and seek feedback to analyze, monitor, and uplift medical services. Such views and perceptions are often shared on social media platforms like Facebook, Instagram, Twitter, etc. Twitter is the most popular and commonly used by the researcher as an online platform for instant access to real-time news, opinions, and discussion. Its trending hashtags (#) and viral content make it an ideal hub for monitoring public opinion on a variety of topics. The tweets are extracted using three hashtags #healthcare, #healthcare services, and #medical facilities. Also, location and tweet sentiment analysis are considered in this study. Several recent studies deployed Twitter datasets using ML and DL models, but the results show lower accuracy. In addition, the studies did not perform extensive comparative analysis and lack validation. This study addresses two research questions: first, what are the sentiments of people toward medical services worldwide? and second, how effective are the machine learning and deep learning approaches for the classification of sentiment on healthcare tweets? Experiments are performed using several well-known machine learning models including support vector machine, logistic regression, Gaussian naive Bayes, extra tree classifier, k nearest neighbor, random forest, decision tree, and AdaBoost. In addition, this study proposes a transfer learning-based LSTM-ETC model that effectively predicts the customer's satisfaction level from the healthcare dataset. Results indicate that despite the best performance by the ETC model with an 0.88 accuracy score, the proposed model outperforms with a 0.95 accuracy score. Predominantly, the people are happy about the provided medical services as the ratio of the positive sentiments is substantially higher than the negative sentiments. The sentiments, either positive or negative, play a crucial role in making important decisions through customer feedback and enhancing quality.

3.
Sensors (Basel) ; 23(21)2023 Nov 03.
Article En | MEDLINE | ID: mdl-37960657

The Internet of Things (IoT) is an innovative technology that presents effective and attractive solutions to revolutionize various domains. Numerous solutions based on the IoT have been designed to automate industries, manufacturing units, and production houses to mitigate human involvement in hazardous operations. Owing to the large number of publications in the IoT paradigm, in particular those focusing on industrial IoT (IIoT), a comprehensive survey is significantly important to provide insights into recent developments. This survey presents the workings of the IoT-based smart industry and its major components and proposes the state-of-the-art network infrastructure, including structured layers of IIoT architecture, IIoT network topologies, protocols, and devices. Furthermore, the relationship between IoT-based industries and key technologies is analyzed, including big data storage, cloud computing, and data analytics. A detailed discussion of IIoT-based application domains, smartphone application solutions, and sensor- and device-based IIoT applications developed for the management of the smart industry is also presented. Consequently, IIoT-based security attacks and their relevant countermeasures are highlighted. By analyzing the essential components, their security risks, and available solutions, future research directions regarding the implementation of IIoT are outlined. Finally, a comprehensive discussion of open research challenges and issues related to the smart industry is also presented.

4.
Cogn Neurodyn ; 17(5): 1229-1259, 2023 Oct.
Article En | MEDLINE | ID: mdl-37786662

Driving a vehicle is a complex, multidimensional, and potentially risky activity demanding full mobilization and utilization of physiological and cognitive abilities. Drowsiness, often caused by stress, fatigue, and illness declines cognitive capabilities that affect drivers' capability and cause many accidents. Drowsiness-related road accidents are associated with trauma, physical injuries, and fatalities, and often accompany economic loss. Drowsy-related crashes are most common in young people and night shift workers. Real-time and accurate driver drowsiness detection is necessary to bring down the drowsy driving accident rate. Many researchers endeavored for systems to detect drowsiness using different features related to vehicles, and drivers' behavior, as well as, physiological measures. Keeping in view the rising trend in the use of physiological measures, this study presents a comprehensive and systematic review of the recent techniques to detect driver drowsiness using physiological signals. Different sensors augmented with machine learning are utilized which subsequently yield better results. These techniques are analyzed with respect to several aspects such as data collection sensor, environment consideration like controlled or dynamic, experimental set up like real traffic or driving simulators, etc. Similarly, by investigating the type of sensors involved in experiments, this study discusses the advantages and disadvantages of existing studies and points out the research gaps. Perceptions and conceptions are made to provide future research directions for drowsiness detection techniques based on physiological signals.

5.
Diagnostics (Basel) ; 13(18)2023 Sep 08.
Article En | MEDLINE | ID: mdl-37761248

A novel approach is presented in this study for the classification of lower limb disorders, with a specific emphasis on the knee, hip, and ankle. The research employs gait analysis and the extraction of PoseNet features from video data in order to effectively identify and categorize these disorders. The PoseNet algorithm facilitates the extraction of key body joint movements and positions from videos in a non-invasive and user-friendly manner, thereby offering a comprehensive representation of lower limb movements. The features that are extracted are subsequently standardized and employed as inputs for a range of machine learning algorithms, such as Random Forest, Extra Tree Classifier, Multilayer Perceptron, Artificial Neural Networks, and Convolutional Neural Networks. The models undergo training and testing processes using a dataset consisting of 174 real patients and normal individuals collected at the Tehsil Headquarter Hospital Sadiq Abad. The evaluation of their performance is conducted through the utilization of K-fold cross-validation. The findings exhibit a notable level of accuracy and precision in the classification of various lower limb disorders. Notably, the Artificial Neural Networks model achieves the highest accuracy rate of 98.84%. The proposed methodology exhibits potential in enhancing the diagnosis and treatment planning of lower limb disorders. It presents a non-invasive and efficient method of analyzing gait patterns and identifying particular conditions.

6.
Sensors (Basel) ; 23(18)2023 Sep 08.
Article En | MEDLINE | ID: mdl-37765813

Despite significant improvement in prognosis, myocardial infarction (MI) remains a major cause of morbidity and mortality around the globe. MI is a life-threatening cardiovascular condition that requires prompt diagnosis and appropriate treatment. The primary objective of this research is to identify instances of anterior and inferior myocardial infarction by utilizing data obtained from Ultra-wideband radar technology in a hospital for patients of anterior and inferior MI. The collected data is preprocessed to extract spectral features. A novel feature engineering approach is designed to fuse temporal features and class prediction probability features derived from the spectral feature dataset. Several well-known machine learning models are implemented and fine-tuned to obtain optimal performance in the detection of anterior and inferior MI. The results demonstrate that integration of the fused feature set with machine learning models results in a notable improvement in both the accuracy and precision of MI detection. Notably, random forest (RF) and k-nearest neighbor showed superb performance with an accuracy of 98.8%. For demonstrating the capacity of models to generalize, K-fold cross-validation is carried out, wherein RF exhibits a mean accuracy of 99.1%. Furthermore, the examination of computational complexity indicates a low computational complexity, thereby indicating computational efficiency.


Inferior Wall Myocardial Infarction , Myocardial Infarction , Humans , Radar , Myocardial Infarction/diagnostic imaging , Cluster Analysis , Machine Learning
7.
PLoS One ; 18(9): e0286541, 2023.
Article En | MEDLINE | ID: mdl-37768959

COVID-19 affected the world's economy severely and increased the inflation rate in both developed and developing countries. COVID-19 also affected the financial markets and crypto markets significantly, however, some crypto markets flourished and touched their peak during the pandemic era. This study performs an analysis of the impact of COVID-19 on public opinion and sentiments regarding the financial markets and crypto markets. It conducts sentiment analysis on tweets related to financial markets and crypto markets posted during COVID-19 peak days. Using sentiment analysis, it investigates the people's sentiments regarding investment in these markets during COVID-19. In addition, damage analysis in terms of market value is also carried out along with the worse time for financial and crypto markets. For analysis, the data is extracted from Twitter using the SNSscraper library. This study proposes a hybrid model called CNN-LSTM (convolutional neural network-long short-term memory model) for sentiment classification. CNN-LSTM outperforms with 0.89, and 0.92 F1 Scores for crypto and financial markets, respectively. Moreover, topic extraction from the tweets is also performed along with the sentiments related to each topic.


COVID-19 , Cryptococcus neoformans , Cryptosporidiosis , Social Media , Humans , Sentiment Analysis , COVID-19/epidemiology , Gene Library
8.
Sensors (Basel) ; 23(16)2023 Aug 08.
Article En | MEDLINE | ID: mdl-37631555

Railway track faults may lead to railway accidents and cause human and financial loss. Spatial, temporal, and weather elements, and wear and tear, lead to ballast, loose nuts, misalignment, and cracks leading to accidents. Manual inspection of such defects is time-consuming and prone to errors. Automatic inspection provides a fast, reliable, and unbiased solution. However, highly accurate fault detection is challenging due to the lack of public datasets, noisy data, inefficient models, etc. To obtain better performance, this study presents a novel approach that relies on mel frequency cepstral coefficient features from acoustic data. The primary objective of this study is to increase fault detection performance. As well as designing an ensemble model, we utilize selective features using chi-square(chi2) that have high importance with respect to the target class. Extensive experiments were carried out to analyze the efficiency of the proposed approach. The experimental results suggest that using 60 features, 40 original features, and 20 chi2 features produces optimal results both regarding accuracy and computational complexity. A mean accuracy score of 0.99 was obtained using the proposed approach with machine learning models using the collected data. Moreover, this performance was significantly better than that of existing approaches; however, the performance of models may vary in real-world settings.

9.
Sensors (Basel) ; 23(15)2023 Aug 01.
Article En | MEDLINE | ID: mdl-37571624

Cricket has a massive global following and is ranked as the second most popular sport globally, with an estimated 2.5 billion fans. Batting requires quick decisions based on ball speed, trajectory, fielder positions, etc. Recently, computer vision and machine learning techniques have gained attention as potential tools to predict cricket strokes played by batters. This study presents a cutting-edge approach to predicting batsman strokes using computer vision and machine learning. The study analyzes eight strokes: pull, cut, cover drive, straight drive, backfoot punch, on drive, flick, and sweep. The study uses the MediaPipe library to extract features from videos and several machine learning and deep learning algorithms, including random forest (RF), support vector machine, k-nearest neighbors, decision tree, linear regression, and long short-term memory to predict the strokes. The study achieves an outstanding accuracy of 99.77% using the RF algorithm, outperforming the other algorithms used in the study. The k-fold validation of the RF model is 95.0% with a standard deviation of 0.07, highlighting the potential of computer vision and machine learning techniques for predicting batsman strokes in cricket. The study's results could help improve coaching techniques and enhance batsmen's performance in cricket, ultimately improving the game's overall quality.


Cricket Sport , Humans , Algorithms , Machine Learning , Support Vector Machine
10.
Multimed Tools Appl ; : 1-23, 2023 Mar 18.
Article En | MEDLINE | ID: mdl-37362743

With an ever-increasing number of mobile users, the development of mobile applications (apps) has become a potential market during the past decade. Billions of users download mobile apps for divergent use from Google Play Store, fulfill tasks and leave comments about their experience. Such reviews are replete with a variety of feedback that serves as a guide for the improvement of existing apps and intuition for novel mobile apps. However, application reviews are challenging and very broad to approach. Such reviews, when segregated into different classes guide the user in the selection of suitable apps. This study proposes a framework for analyzing the sentiment of reviews for apps of eight different categories like shopping, sports, casual, etc. A large dataset is scrapped comprising 251661 user reviews with the help of 'Regular Expression' and 'Beautiful Soup'. The framework follows the use of different machine learning models along with the term frequency-inverse document frequency (TF-IDF) for feature extraction. Extensive experiments are performed using preprocessing steps, as well as, the stats feature of app reviews to evaluate the performance of the models. Results indicate that combining the stats feature with TF-IDF shows better performance and the support vector machine obtains the highest accuracy. Experimental results can potentially be used by other researchers to select appropriate models for the analysis of app reviews. In addition, the provided dataset is large, diverse, and balanced with eight categories and 59 app reviews and provides the opportunity to analyze reviews using state-of-the-art approaches.

11.
PeerJ Comput Sci ; 9: e1193, 2023.
Article En | MEDLINE | ID: mdl-37346556

With the rise of social media platforms, sharing reviews has become a social norm in today's modern society. People check customer views on social networking sites about different fast food restaurants and food items before visiting the restaurants and ordering food. Restaurants can compete to better the quality of their offered items or services by carefully analyzing the feedback provided by customers. People tend to visit restaurants with a higher number of positive reviews. Accordingly, manually collecting feedback from customers for every product is a labor-intensive process; the same is true for sentiment analysis. To overcome this, we use sentiment analysis, which automatically extracts meaningful information from the data. Existing studies predominantly focus on machine learning models. As a consequence, the performance analysis of deep learning models is neglected primarily and of the deep ensemble models especially. To this end, this study adopts several deep ensemble models including Bi long short-term memory and gated recurrent unit (BiLSTM+GRU), LSTM+GRU, GRU+recurrent neural network (GRU+RNN), and BiLSTM+RNN models using self-collected unstructured tweets. The performance of lexicon-based methods is compared with deep ensemble models for sentiment classification. In addition, the study makes use of Latent Dirichlet Allocation (LDA) modeling for topic analysis. For experiments, the tweets for the top five fast food serving companies are collected which include KFC, Pizza Hut, McDonald's, Burger King, and Subway. Experimental results reveal that deep ensemble models yield better results than the lexicon-based approach and BiLSTM+GRU obtains the highest accuracy of 95.31% for three class problems. Topic modeling indicates that the highest number of negative sentiments are represented for Subway restaurants with high-intensity negative words. The majority of the people (49%) remain neutral regarding the choice of fast food, 31% seem to like fast food while the rest (20%) dislike fast food.

12.
PeerJ Comput Sci ; 9: e1353, 2023.
Article En | MEDLINE | ID: mdl-37346628

With the rise of social media, the dissemination of forged content and news has been on the rise. Consequently, fake news detection has emerged as an important research problem. Several approaches have been presented to discriminate fake news from real news, however, such approaches lack robustness for multi-domain datasets, especially within the context of Urdu news. In addition, some studies use machine-translated datasets using English to Urdu Google translator and manual verification is not carried out. This limits the wide use of such approaches for real-world applications. This study investigates these issues and proposes fake news classier for Urdu news. The dataset has been collected covering nine different domains and constitutes 4097 news. Experiments are performed using the term frequency-inverse document frequency (TF-IDF) and a bag of words (BoW) with the combination of n-grams. The major contribution of this study is the use of feature stacking, where feature vectors of preprocessed text and verbs extracted from the preprocessed text are combined. Support vector machine, k-nearest neighbor, and ensemble models like random forest (RF) and extra tree (ET) were used for bagging while stacking was applied with ET and RF as base learners with logistic regression as the meta learner. To check the robustness of models, fivefold and independent set testing were employed. Experimental results indicate that stacking achieves 93.39%, 88.96%, 96.33%, 86.2%, and 93.17% scores for accuracy, specificity, sensitivity, MCC, ROC, and F1 score, respectively.

13.
PeerJ Comput Sci ; 9: e1134, 2023.
Article En | MEDLINE | ID: mdl-37346732

Business collapse is a common event in economies, small and big alike. A firm's health is crucial to its stakeholders like creditors, investors, partners, etc. and prediction of the upcoming financial crisis is significantly important to devise appropriate strategies to avoid business collapses. Bankruptcy prediction has been regarded as a critical topic in the world of accounting and finance. Methodologies and strategies have been investigated in the research domain for predicting company bankruptcy more promptly and accurately. Conventionally, predicting the financial risk and bankruptcy has been solely achieved using the historic financial data. CEOs also communicate verbally via press releases and voice characteristics, such as emotion and tone may reflect a company's success, according to anecdotal evidence. Companies' publicly available earning calls data is one of the main sources of information to understand how businesses are doing and what are expectations for the next quarters. An earnings call is a conference call between the management of a company and the media. During the call, management offers an overview of recent performance and provides a guide for the next quarter's expectations. The earning calls summary provided by the management can extract CEO's emotions using sentiment analysis. This article investigates the prediction of firms' health in terms of bankruptcy and non-bankruptcy based on emotions extracted from earning calls and proposes a deep learning model in this regard. Features extracted from long short-term memory (LSTM) network are used to train machine learning models. Results show that the models provide results with a high score of 0.93, each for accuracy and F1 when trained on LSTM extracted feature from synthetic minority oversampling technique (SMOTE) balanced data. LSTM features provide better performance than traditional bag of words and TF-IDF features.

14.
PLoS One ; 18(4): e0284522, 2023.
Article En | MEDLINE | ID: mdl-37079536

Microbe organisms make up approximately 60% of the earth's living matter and the human body is home to millions of microbe organisms. Microbes are microbial threats to health and may lead to several diseases in humans like toxoplasmosis and malaria. The microbiological toxoplasmosis disease in humans is widespread, with a seroprevalence of 3.6-84% in sub-Saharan Africa. This necessitates an automated approach for microbe organisms detection. The primary objective of this study is to predict microbe organisms in the human body. A novel hybrid microbes classifier (HMC) is proposed in this study which is based on a decision tree classifier and extra tree classifier using voting criteria. Experiments involve different machine learning and deep learning models for detecting ten different living microforms of life. Results suggest that the proposed HMC approach achieves a 98% accuracy score, 98% geometric mean score, 97% precision score, and 97% Cohen Kappa score. The proposed model outperforms employed models, as well as, existing state-of-the-art models. Moreover, the k-fold cross-validation corroborates the results as well. The research helps microbiologists identify the type of microbe organisms with high accuracy and prevents many diseases through early detection.


Algorithms , Machine Learning , Humans , Seroepidemiologic Studies
15.
Diagnostics (Basel) ; 13(6)2023 Mar 14.
Article En | MEDLINE | ID: mdl-36980404

Chronic obstructive pulmonary disease (COPD) is a severe and chronic ailment that is currently ranked as the third most common cause of mortality across the globe. COPD patients often experience debilitating symptoms such as chronic coughing, shortness of breath, and fatigue. Sadly, the disease frequently goes undiagnosed until it is too late, leaving patients without the care they desperately need. So, COPD detection at an early stage is crucial to prevent further damage to the lungs and improve quality of life. Traditional COPD detection methods often rely on physical examinations and tests such as spirometry, chest radiography, blood gas tests, and genetic tests. However, these methods may not always be accurate or accessible. One of the key vital signs for detecting COPD is the patient's respiration rate. However, it is crucial to consider a patient's medical and demographic characteristics simultaneously for better detection results. To address this issue, this study aims to detect COPD patients using artificial intelligence techniques. To achieve this goal, a novel framework is proposed that utilizes ultra-wideband (UWB) radar-based temporal and spectral features to build machine learning and deep learning models. This new set of temporal and spectral features is extracted from respiration data collected non-invasively from 1.5 m distance using UWB radar. Different machine learning and deep learning models are trained and tested on the collected dataset. The findings are promising, with a high accuracy score of 100% for COPD detection. This means that the proposed framework could potentially save lives by identifying COPD patients at an early stage. The k-fold cross-validation technique and performance comparison with the state-of-the-art studies are applied to validate its performance, ensuring that the results are robust and reliable. The high accuracy score achieved in the study implies that the proposed framework has the potential for the efficient detection of COPD at an early stage.

16.
Sensors (Basel) ; 23(3)2023 Jan 20.
Article En | MEDLINE | ID: mdl-36772250

With the advancement in information technology, digital data stealing and duplication have become easier. Over a trillion bytes of data are generated and shared on social media through the internet in a single day, and the authenticity of digital data is currently a major problem. Cryptography and image watermarking are domains that provide multiple security services, such as authenticity, integrity, and privacy. In this paper, a digital image watermarking technique is proposed that employs the least significant bit (LSB) and canny edge detection method. The proposed method provides better security services and it is computationally less expensive, which is the demand of today's world. The major contribution of this method is to find suitable places for watermarking embedding and provides additional watermark security by scrambling the watermark image. A digital image is divided into non-overlapping blocks, and the gradient is calculated for each block. Then convolution masks are applied to find the gradient direction and magnitude, and non-maximum suppression is applied. Finally, LSB is used to embed the watermark in the hysteresis step. Furthermore, additional security is provided by scrambling the watermark signal using our chaotic substitution box. The proposed technique is more secure because of LSB's high payload and watermark embedding feature after a canny edge detection filter. The canny edge gradient direction and magnitude find how many bits will be embedded. To test the performance of the proposed technique, several image processing, and geometrical attacks are performed. The proposed method shows high robustness to image processing and geometrical attacks.

17.
Cancers (Basel) ; 15(3)2023 Jan 22.
Article En | MEDLINE | ID: mdl-36765642

Breast cancer is one of the most common invasive cancers in women and it continues to be a worldwide medical problem since the number of cases has significantly increased over the past decade. Breast cancer is the second leading cause of death from cancer in women. The early detection of breast cancer can save human life but the traditional approach for detecting breast cancer disease needs various laboratory tests involving medical experts. To reduce human error and speed up breast cancer detection, an automatic system is required that would perform the diagnosis accurately and timely. Despite the research efforts for automated systems for cancer detection, a wide gap exists between the desired and provided accuracy of current approaches. To overcome this issue, this research proposes an approach for breast cancer prediction by selecting the best fine needle aspiration features. To enhance the prediction accuracy, several feature selection techniques are applied to analyze their efficacy, such as principal component analysis, singular vector decomposition, and chi-square (Chi2). Extensive experiments are performed with different features and different set sizes of features to investigate the optimal feature set. Additionally, the influence of imbalanced and balanced data using the SMOTE approach is investigated. Six classifiers including random forest, support vector machine, gradient boosting machine, logistic regression, multilayer perceptron, and K-nearest neighbors (KNN) are tuned to achieve increased classification accuracy. Results indicate that KNN outperforms all other classifiers on the used dataset with 20 features using SVD and with the 15 most important features using a PCA with a 100% accuracy score.

18.
PLoS One ; 17(11): e0276525, 2022.
Article En | MEDLINE | ID: mdl-36350808

Maternal health is an important aspect of women's health during pregnancy, childbirth, and the postpartum period. Specifically, during pregnancy, different health factors like age, blood disorders, heart rate, etc. can lead to pregnancy complications. Detecting such health factors can alleviate the risk of pregnancy-related complications. This study aims to develop an artificial neural network-based system for predicting maternal health risks using health data records. A novel deep neural network architecture, DT-BiLTCN is proposed that uses decision trees, a bidirectional long short-term memory network, and a temporal convolutional network. Experiments involve using a dataset of 1218 samples collected from maternal health care, hospitals, and community clinics using the IoT-based risk monitoring system. Class imbalance is resolved using the synthetic minority oversampling technique. DT-BiLTCN provides a feature set to obtain high accuracy results which in this case are provided by the support vector machine with a 98% accuracy. Maternal health exploratory data analysis reveals that the health conditions which are the strongest indications of health risk during pregnancy are diastolic and systolic blood pressure, heart rate, and age of pregnant women. Using the proposed model, timely prediction of health risks associated with pregnant women can be made thus mitigating the risk of health complications which helps to save lives.


Maternal Health , Pregnancy Complications , Female , Pregnancy , Humans , Neural Networks, Computer , Support Vector Machine , Pregnancy Complications/epidemiology , Learning
19.
Sci Rep ; 12(1): 19999, 2022 11 21.
Article En | MEDLINE | ID: mdl-36411295

[Formula: see text]-Thalassemia is one of the dangerous causes of the high mortality rate in the Mediterranean countries. Substantial resources are required to save a [Formula: see text]-Thalassemia carriers' life and early detection of thalassemia patients can help appropriate treatment to increase the carrier's life expectancy. Being a genetic disease, it can not be prevented however the analysis of several indicators in parents' blood can be used to detect disorders causing Thalassemia. Laboratory tests for Thalassemia are time-consuming and expensive like high-performance liquid chromatography, Complete Blood Count (CBC) with peripheral smear, genetic test, etc. Red blood indices from CBC can be used with machine learning models for the same task. Despite the available approaches for Thalassemia carriers from CBC data, gaps exist between the desired and achieved accuracy. Moreover, the data imbalance problem is studied well which makes the models less generalizable. This study proposes a highly accurate approach for [Formula: see text]-Thalassemia detection using red blood indices from CBC augmented by supervised machine learning. In view of the fact that all the features do not carry predictive information regarding the target variable, this study employs a unified framework of two features selection techniques including Principal Component Analysis (PCA) and Singular Vector Decomposition (SVD). The data imbalance between [Formula: see text]-Thalassemia carrier and non-carriers is handled by Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN). Extensive experiments are performed using many state-of-the-art machine learning models and deep learning models. Experimental results indicate the superiority of the proposed approach over existing approaches with an accuracy score of 0.96.


Thalassemia , Humans , Animals , Blood Cell Count , Algorithms , Chromatography, High Pressure Liquid , Disease Vectors
20.
Healthcare (Basel) ; 10(11)2022 Nov 08.
Article En | MEDLINE | ID: mdl-36360571

White blood cell (WBC) type classification is a task of significant importance for diagnosis using microscopic images of WBC, which develop immunity to fight against infections and foreign substances. WBCs consist of different types, and abnormalities in a type of WBC may potentially represent a disease such as leukemia. Existing studies are limited by low accuracy and overrated performance, often caused by model overfit due to an imbalanced dataset. Additionally, many studies consider a lower number of WBC types, and the accuracy is exaggerated. This study presents a hybrid feature set of selective features and synthetic minority oversampling technique-based resampling to mitigate the influence of the above-mentioned problems. Furthermore, machine learning models are adopted for being less computationally complex, requiring less data for training, and providing robust results. Experiments are performed using both machine- and deep learning models for performance comparison using the original dataset, augmented dataset, and oversampled dataset to analyze the performances of the models. The results suggest that a hybrid feature set of both texture and RGB features from microscopic images, selected using Chi2, produces a high accuracy of 0.97 with random forest. Performance appraisal using k-fold cross-validation and comparison with existing state-of-the-art studies shows that the proposed approach outperforms existing studies regarding the obtained accuracy and computational complexity.

...