RESUMO
Non-Alcoholic Fatty Liver Disease (NAFLD) is characterized by the accumulation of excess fat in the liver. If left undiagnosed and untreated during the early stages, NAFLD can progress to more severe conditions such as inflammation, liver fibrosis, cirrhosis, and even liver failure. In this study, machine learning techniques were employed to predict NAFLD using affordable and accessible laboratory test data, while the conventional technique hepatic steatosis index (HSI)was calculated for comparison. Six algorithms (random forest, K-nearest Neighbors, Logistic Regression, Support Vector Machine, extreme gradient boosting, decision tree), along with an ensemble model, were utilized for dataset analysis. The objective was to develop a cost-effective tool for enabling early diagnosis, leading to better management of the condition. The issue of imbalanced data was addressed using the Synthetic Minority Oversampling Technique Edited Nearest Neighbors (SMOTEENN). Various evaluation metrics including the F1 score, precision, accuracy, recall, confusion matrix, the mean absolute error (MAE), receiver operating characteristics (ROC), and area under the curve (AUC) were employed to assess the suitability of each technique for disease prediction. Experimental results using the National Health and Nutrition Examination Survey (NHANES) dataset demonstrated that the ensemble model achieved the highest accuracy (0.99) and AUC (1.00) compared to the machine learning techniques that we used and HSI. These findings indicate that the ensemble model holds potential as a beneficial tool for healthcare professionals to predict NAFLD, leveraging accessible and cost-effective laboratory test data.