Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 19(9): e0307536, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39226285

RESUMO

Educational Data Mining (EDM) holds promise in uncovering insights from educational data to predict and enhance students' performance. This paper presents an advanced EDM system tailored for classifying and improving tertiary students' programming skills. Our approach emphasizes effective feature engineering, appropriate classification techniques, and the integration of Explainable Artificial Intelligence (XAI) to elucidate model decisions. Through rigorous experimentation, including an ablation study and evaluation of six machine learning algorithms, we introduce a novel ensemble method, Stacking-SRDA, which outperforms others in accuracy, precision, recall, f1-score, ROC curve, and McNemar test. Leveraging XAI tools, we provide insights into model interpretability. Additionally, we propose a system for identifying skill gaps in programming among weaker students, offering tailored recommendations for skill enhancement.


Assuntos
Mineração de Dados , Estudantes , Mineração de Dados/métodos , Humanos , Aprendizado de Máquina , Algoritmos , Inteligência Artificial
2.
Sci Rep ; 14(1): 15154, 2024 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-38956297

RESUMO

Historically, the analysis of stimulus-dependent time-frequency patterns has been the cornerstone of most electroencephalography (EEG) studies. The abnormal oscillations in high-frequency waves associated with psychotic disorders during sensory and cognitive tasks have been studied many times. However, any significant dissimilarity in the resting-state low-frequency bands is yet to be established. Spectral analysis of the alpha and delta band waves shows the effectiveness of stimulus-independent EEG in identifying the abnormal activity patterns of pathological brains. A generalized model incorporating multiple frequency bands should be more efficient in associating potential EEG biomarkers with first-episode psychosis (FEP), leading to an accurate diagnosis. We explore multiple machine-learning methods, including random-forest, support vector machine, and Gaussian process classifier (GPC), to demonstrate the practicality of resting-state power spectral density (PSD) to distinguish patients of FEP from healthy controls. A comprehensive discussion of our preprocessing methods for PSD analysis and a detailed comparison of different models are included in this paper. The GPC model outperforms the other models with a specificity of 95.78% to show that PSD can be used as an effective feature extraction technique for analyzing and classifying resting-state EEG signals of psychiatric disorders.


Assuntos
Eletroencefalografia , Transtornos Psicóticos , Máquina de Vetores de Suporte , Humanos , Transtornos Psicóticos/fisiopatologia , Transtornos Psicóticos/diagnóstico , Eletroencefalografia/métodos , Feminino , Masculino , Adulto , Adulto Jovem , Descanso/fisiologia , Aprendizado de Máquina , Encéfalo/fisiopatologia , Adolescente , Processamento de Sinais Assistido por Computador
3.
PLoS One ; 19(7): e0307027, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39008472

RESUMO

The rise of social media has changed how people view connections. Machine Learning (ML)-based sentiment analysis and news categorization help understand emotions and access news. However, most studies focus on complex models requiring heavy resources and slowing inference times, making deployment difficult in resource-limited environments. In this paper, we process both structured and unstructured data, determining the polarity of text using the TextBlob scheme to determine the sentiment of news headlines. We propose a Stochastic Gradient Descent (SGD)-based Ridge classifier (RC) for blending SGDR with an advanced string processing technique to effectively classify news articles. Additionally, we explore existing supervised and unsupervised ML algorithms to gauge the effectiveness of our SGDR classifier. The scalability and generalization capability of SGD and L2 regularization techniques in RCs to handle overfitting and balance bias and variance provide the proposed SGDR with better classification capability. Experimental results highlight that our string processing pipeline significantly boosts the performance of all ML models. Notably, our ensemble SGDR classifier surpasses all state-of-the-art ML algorithms, achieving an impressive 98.12% accuracy. McNemar's significance tests reveal that our SGDR classifier achieves a 1% significance level improvement over K-Nearest Neighbor, Decision Tree, and AdaBoost and a 5% significance level improvement over other algorithms. These findings underscore the superior proficiency of linear models in news categorization compared to tree-based and nonlinear counterparts. This study contributes valuable insights into the efficacy of the proposed methodology, elucidating its potential for news categorization and sentiment analysis.


Assuntos
Algoritmos , Aprendizado de Máquina , Mídias Sociais , Humanos , Emoções
4.
PLoS One ; 19(5): e0300785, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38753669

RESUMO

Diabetes is a persistent metabolic disorder linked to elevated levels of blood glucose, commonly referred to as blood sugar. This condition can have detrimental effects on the heart, blood vessels, eyes, kidneys, and nerves as time passes. It is a chronic ailment that arises when the body fails to produce enough insulin or is unable to effectively use the insulin it produces. When diabetes is not properly managed, it often leads to hyperglycemia, a condition characterized by elevated blood sugar levels or impaired glucose tolerance. This can result in significant harm to various body systems, including the nerves and blood vessels. In this paper, we propose a multiclass diabetes mellitus detection and classification approach using an extremely imbalanced Laboratory of Medical City Hospital data dynamics. We also formulate a new dataset that is moderately imbalanced based on the Laboratory of Medical City Hospital data dynamics. To correctly identify the multiclass diabetes mellitus, we employ three machine learning classifiers namely support vector machine, logistic regression, and k-nearest neighbor. We also focus on dimensionality reduction (feature selection-filter, wrapper, and embedded method) to prune the unnecessary features and to scale up the classification performance. To optimize the classification performance of classifiers, we tune the model by hyperparameter optimization with 10-fold grid search cross-validation. In the case of the original extremely imbalanced dataset with 70:30 partition and support vector machine classifier, we achieved maximum accuracy of 0.964, precision of 0.968, recall of 0.964, F1-score of 0.962, Cohen kappa of 0.835, and AUC of 0.99 by using top 4 feature according to filter method. By using the top 9 features according to wrapper-based sequential feature selection, the k-nearest neighbor provides an accuracy of 0.935 and 1.0 for the other performance metrics. For our created moderately imbalanced dataset with an 80:20 partition, the SVM classifier achieves a maximum accuracy of 0.938, and 1.0 for other performance metrics. For the multiclass diabetes mellitus detection and classification, our experiments outperformed conducted research based on the Laboratory of Medical City Hospital data dynamics.


Assuntos
Diabetes Mellitus , Aprendizado de Máquina , Humanos , Iraque/epidemiologia , Diabetes Mellitus/diagnóstico , Diabetes Mellitus/sangue , Máquina de Vetores de Suporte , Glicemia/análise , Modelos Logísticos
5.
PeerJ Comput Sci ; 10: e1917, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38660196

RESUMO

Heart disease is one of the primary causes of morbidity and death worldwide. Millions of people have had heart attacks every year, and only early-stage predictions can help to reduce the number. Researchers are working on designing and developing early-stage prediction systems using different advanced technologies, and machine learning (ML) is one of them. Almost all existing ML-based works consider the same dataset (intra-dataset) for the training and validation of their method. In particular, they do not consider inter-dataset performance checks, where different datasets are used in the training and testing phases. In inter-dataset setup, existing ML models show a poor performance named the inter-dataset discrepancy problem. This work focuses on mitigating the inter-dataset discrepancy problem by considering five available heart disease datasets and their combined form. All potential training and testing mode combinations are systematically executed to assess discrepancies before and after applying the proposed methods. Imbalance data handling using SMOTE-Tomek, feature selection using random forest (RF), and feature extraction using principle component analysis (PCA) with a long preprocessing pipeline are used to mitigate the inter-dataset discrepancy problem. The preprocessing pipeline builds on missing value handling using RF regression, log transformation, outlier removal, normalization, and data balancing that convert the datasets to more ML-centric. Support vector machine, K-nearest neighbors, decision tree, RF, eXtreme Gradient Boosting, Gaussian naive Bayes, logistic regression, and multilayer perceptron are used as classifiers. Experimental results show that feature selection and classification using RF produce better results than other combination strategies in both single- and inter-dataset setups. In certain configurations of individual datasets, RF demonstrates 100% accuracy and 96% accuracy during the feature selection phase in an inter-dataset setup, exhibiting commendable precision, recall, F1 score, specificity, and AUC score. The results indicate that an effective preprocessing technique has the potential to improve the performance of the ML model without necessitating the development of intricate prediction models. Addressing inter-dataset discrepancies introduces a novel research avenue, enabling the amalgamation of identical features from various datasets to construct a comprehensive global dataset within a specific domain.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38384147

RESUMO

The death of brain cells occurs when blood flow to a particular area of the brain is abruptly cut off, resulting in a stroke. Early recognition of stroke symptoms is essential to prevent strokes and promote a healthy lifestyle. FAST tests (looking for abnormalities in the face, arms, and speech) have limitations in reliability and accuracy for diagnosing strokes. This research employs machine learning (ML) techniques to develop and assess multiple ML models to establish a robust stroke risk prediction framework. This research uses a stacking-based ensemble method to select the best three machine learning (ML) models and combine their collective intelligence. An empirical evaluation of a publicly available stroke prediction dataset demonstrates the superior performance of the proposed stacking-based ensemble model, with only one misclassification. The experimental results reveal that the proposed stacking model surpasses other state-of-the-art research, achieving accuracy, precision, F1-score of 99.99%, recall of 100%, receiver operating characteristics (ROC), Mathews correlation coefficient (MCC), and Kappa scores 1.0. Furthermore, Shapley's Additive Explanations (SHAP) are employed to analyze the predictions of the black-box machine learning (ML) models. The findings highlight that age, BMI, and glucose level are the most significant risk factors for stroke prediction. These findings contribute to the development of more efficient techniques for stroke prediction, potentially saving many lives.

7.
Front Plant Sci ; 14: 1234555, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37636091

RESUMO

Agriculture is the most critical sector for food supply on the earth, and it is also responsible for supplying raw materials for other industrial productions. Currently, the growth in agricultural production is not sufficient to keep up with the growing population, which may result in a food shortfall for the world's inhabitants. As a result, increasing food production is crucial for developing nations with limited land and resources. It is essential to select a suitable crop for a specific region to increase its production rate. Effective crop production forecasting in that area based on historical data, including environmental and cultivation areas, and crop production amount, is required. However, the data for such forecasting are not publicly available. As such, in this paper, we take a case study of a developing country, Bangladesh, whose economy relies on agriculture. We first gather and preprocess the data from the relevant research institutions of Bangladesh and then propose an ensemble machine learning approach, called K-nearest Neighbor Random Forest Ridge Regression (KRR), to effectively predict the production of the major crops (three different kinds of rice, potato, and wheat). KRR is designed after investigating five existing traditional machine learning (Support Vector Regression, Naïve Bayes, and Ridge Regression) and ensemble learning (Random Forest and CatBoost) algorithms. We consider four classical evaluation metrics, i.e., mean absolute error, mean square error (MSE), root MSE, and R 2, to evaluate the performance of the proposed KRR over the other machine learning models. It shows 0.009 MSE, 99% R 2 for Aus; 0.92 MSE, 90% R 2 for Aman; 0.246 MSE, 99% R 2 for Boro; 0.062 MSE, 99% R 2 for wheat; and 0.016 MSE, 99% R 2 for potato production prediction. The Diebold-Mariano test is conducted to check the robustness of the proposed ensemble model, KRR. In most cases, it shows 1% and 5% significance compared to the benchmark ML models. Lastly, we design a recommender system that suggests suitable crops for a specific land area for cultivation in the next season. We believe that the proposed paradigm will help the farmers and personnel in the agricultural sector leverage proper crop cultivation and production.

8.
Sensors (Basel) ; 23(2)2023 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-36679453

RESUMO

A hyperspectral image (HSI), which contains a number of contiguous and narrow spectral wavelength bands, is a valuable source of data for ground cover examinations. Classification using the entire original HSI suffers from the "curse of dimensionality" problem because (i) the image bands are highly correlated both spectrally and spatially, (ii) not every band can carry equal information, (iii) there is a lack of enough training samples for some classes, and (iv) the overall computational cost is high. Therefore, effective feature (band) reduction is necessary through feature extraction (FE) and/or feature selection (FS) for improving the classification in a cost-effective manner. Principal component analysis (PCA) is a frequently adopted unsupervised FE method in HSI classification. Nevertheless, its performance worsens when the dataset is noisy, and the computational cost becomes high. Consequently, this study first proposed an efficient FE approach using a normalized mutual information (NMI)-based band grouping strategy, where the classical PCA was applied to each band subgroup for intrinsic FE. Finally, the subspace of the most effective features was generated by the NMI-based minimum redundancy and maximum relevance (mRMR) FS criteria. The subspace of features was then classified using the kernel support vector machine. Two real HSIs collected by the AVIRIS and HYDICE sensors were used in an experiment. The experimental results demonstrated that the proposed feature reduction approach significantly improved the classification performance. It achieved the highest overall classification accuracy of 94.93% for the AVIRIS dataset and 99.026% for the HYDICE dataset. Moreover, the proposed approach reduced the computational cost compared with the studied methods.


Assuntos
Máquina de Vetores de Suporte , Análise de Componente Principal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA