RESUMEN
This paper presents a comprehensive exploration of machine learning algorithms (MLAs) and feature selection techniques for accurate heart disease prediction (HDP) in modern healthcare. By focusing on diverse datasets encompassing various challenges, the research sheds light on optimal strategies for early detection. MLAs such as Decision Trees (DT), Random Forests (RF), Support Vector Machines (SVM), Gaussian Naive Bayes (NB), and others were studied, with precision and recall metrics emphasized for robust predictions. Our study addresses challenges in real-world data through data cleaning and one-hot encoding, enhancing the integrity of our predictive models. Feature extraction techniques-Recursive Feature Extraction (RFE), Principal Component Analysis (PCA), and univariate feature selection-play a crucial role in identifying relevant features and reducing data dimensionality. Our findings showcase the impact of these techniques on improving prediction accuracy. Optimized models for each dataset have been achieved through grid search hyperparameter tuning, with configurations meticulously outlined. Notably, a remarkable 99.12 % accuracy was achieved on the first Kaggle dataset, showcasing the potential for accurate HDP. Model robustness across diverse datasets was highlighted, with caution against overfitting. The study emphasizes the need for validation of unseen data and encourages ongoing research for generalizability. Serving as a practical guide, this research aids researchers and practitioners in HDP model development, influencing clinical decisions and healthcare resource allocation. By providing insights into effective algorithms and techniques, the paper contributes to reducing heart disease-related morbidity and mortality, supporting the healthcare community's ongoing efforts.
Asunto(s)
Cardiopatías , Aprendizaje Automático , Medicina de Precisión , Humanos , Medicina de Precisión/métodos , Algoritmos , Máquina de Vectores de SoporteRESUMEN
Rapid urbanization has caused severe deterioration of air quality globally, leading to increased hospitalization and premature deaths. Therefore, accurate prediction of air quality is crucial for mitigation planning to support urban sustainability and resilience. Although some studies have predicted air pollutants such as particulate matter (PM) using machine learning algorithms (MLAs), there is a paucity of studies on spatial hazard assessment with respect to the air quality index (AQI). Incorporating PM in AQI studies is crucial because of its easily inhalable micro-size which has adverse impacts on ecology, environment, and human health. Accurate and timely prediction of the air quality index can ensure adequate intervention to aid air quality management. Therefore, this study undertakes a spatial hazard assessment of the air quality index using particulate matter with a diameter of 10 µm or lesser (PM10) in Selangor, Malaysia, by developing four machine learning models: eXtreme Gradient Boosting (XGBoost), random forest (RF), K-nearest neighbour (KNN), and Naive Bayes (NB). Spatially processed data such as NDVI, SAVI, BU, LST, Ws, slope, elevation, and road density was used for the modelling. The model was trained with 70% of the dataset, while 30% was used for cross-validation. Results showed that XGBoost has the highest overall accuracy and precision of 0.989 and 0.995, followed by random forest (0.989, 0.993), K-nearest neighbour (0.987, 0.984), and Naive Bayes (0.917, 0.922), respectively. The spatial air quality maps were generated by integrating the geographical information system (GIS) with the four MLAs, which correlated with Malaysia's air pollution index. The maps indicate that air quality in Selangor is satisfactory and posed no threats to health. Nevertheless, the two algorithms with the best performance (XGBoost and RF) indicate that a high percentage of the air quality is moderate. The study concludes that successful air pollution management policies such as green infrastructure practice, improvement of energy efficiency, and restrictions on heavy-duty vehicles can be adopted in Selangor and other Southeast Asian cities to prevent deterioration of air quality in the future.
Asunto(s)
Contaminantes Atmosféricos , Contaminación del Aire , Humanos , Sistemas de Información Geográfica , Teorema de Bayes , Ciudades , Malasia , Crecimiento Sostenible , Contaminación del Aire/análisis , Contaminantes Atmosféricos/análisis , Material Particulado/análisis , Aprendizaje Automático , AlgoritmosRESUMEN
This paper proposes a methodology for correlating products derived by Synthetic Aperture Radar (SAR) measurements and laser profilometric road roughness surveys. The procedure stems from two previous studies, in which several Machine Learning Algorithms (MLAs) have been calibrated for predicting the average vertical displacement (in terms of mm/year) of road pavements as a result of exogenous phenomena occurrence, such as subsidence. Such algorithms are based on surveys performed with Persistent Scatterer Interferometric SAR (PS-InSAR) over an area of 964 km2 in the Tuscany Region, Central Italy. Starting from this basis, in this paper, we propose to integrate the information provided by these MLAs with 10 km of in situ profilometric measurements of the pavement surface roughness and relative calculation of the International Roughness Index (IRI). Accordingly, the aim is to appreciate whether and to what extent there is an association between displacements estimated by MLAs and IRI values. If a dependence exists, we may argue that road regularity is driven by exogenous phenomena and MLAs allow for the replacement of in situ surveys, saving considerable time and money. In this research framework, results reveal that there are several road sections that manifest a clear association among these two methods, while others denote that the relationship is weaker, and in situ activities cannot be bypassed to evaluate the real pavement conditions. We could wrap up that, in these stretches, the road regularity is driven by endogenous factors which MLAs did not integrate during their training. Once additional MLAs conditioned by endogenous factors have been developed (such as traffic flow, the structure of the pavement layers, and material characteristics), practitioners should be able to estimate the quality of pavement over extensive and complex road networks quickly, automatically, and with relatively low costs.