Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Automated machine learning for fabric quality prediction: a comparative analysis.

Metin, Ahmet; Bilgin, Turgay Tugay.

PeerJ Comput Sci ; 10: e2188, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39145237

RESUMO

The enhancement of fabric quality prediction in the textile manufacturing sector is achieved by utilizing information derived from sensors within the Internet of Things (IoT) and Enterprise Resource Planning (ERP) systems linked to sensors embedded in textile machinery. The integration of Industry 4.0 concepts is instrumental in harnessing IoT sensor data, which, in turn, leads to improvements in productivity and reduced lead times in textile manufacturing processes. This study addresses the issue of imbalanced data pertaining to fabric quality within the textile manufacturing industry. It encompasses an evaluation of seven open-source automated machine learning (AutoML) technologies, namely FLAML (Fast Lightweight AutoML), AutoViML (Automatically Build Variant Interpretable ML models), EvalML (Evaluation Machine Learning), AutoGluon, H2OAutoML, PyCaret, and TPOT (Tree-based Pipeline Optimization Tool). The most suitable solutions are chosen for certain circumstances by employing an innovative approach that finds a compromise among computational efficiency and forecast accuracy. The results reveal that EvalML emerges as the top-performing AutoML model for a predetermined objective function, particularly excelling in terms of mean absolute error (MAE). On the other hand, even with longer inference periods, AutoGluon performs better than other methods in measures like mean absolute percentage error (MAPE), root mean squared error (RMSE), and r-squared. Additionally, the study explores the feature importance rankings provided by each AutoML model, shedding light on the attributes that significantly influence predictive outcomes. Notably, sin/cos encoding is found to be particularly effective in characterizing categorical variables with a large number of unique values. This study includes useful information about the application of AutoML in the textile industry and provides a roadmap for employing Industry 4.0 technologies to enhance fabric quality prediction. The research highlights the importance of striking a balance between predictive accuracy and computational efficiency, emphasizes the significance of feature importance for model interpretability, and lays the groundwork for future investigations in this field.

2.

Automated machine learning models for nonalcoholic fatty liver disease assessed by controlled attenuation parameter from the NHANES 2017-2020.

Liu, Lihe; Lin, Jiaxi; Liu, Lu; Gao, Jingwen; Xu, Guoting; Yin, Minyue; Liu, Xiaolin; Wu, Airong; Zhu, Jinzhou.

Digit Health ; 10: 20552076241272535, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39119551

RESUMO

Background: Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database. Methods: All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot. Results: A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set. Conclusions: We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.

3.

Transfer Learning Video Classification of Preserved, Mid-Range, and Reduced Left Ventricular Ejection Fraction in Echocardiography.

Decoodt, Pierre; Sierra-Sosa, Daniel; Anghel, Laura; Cuminetti, Giovanni; De Keyzer, Eva; Morissens, Marielle.

Diagnostics (Basel) ; 14(13)2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-39001328

RESUMO

Identifying patients with left ventricular ejection fraction (EF), either reduced [EF < 40% (rEF)], mid-range [EF 40-50% (mEF)], or preserved [EF > 50% (pEF)], is considered of primary clinical importance. An end-to-end video classification using AutoML in Google Vertex AI was applied to echocardiographic recordings. Datasets balanced by majority undersampling, each corresponding to one out of three possible classifications, were obtained from the Standford EchoNet-Dynamic repository. A train-test split of 75/25 was applied. A binary video classification of rEF vs. not rEF demonstrated good performance (test dataset: ROC AUC score 0.939, accuracy 0.863, sensitivity 0.894, specificity 0.831, positive predicting value 0.842). A second binary classification of not pEF vs. pEF was slightly less performing (test dataset: ROC AUC score 0.917, accuracy 0.829, sensitivity 0.761, specificity 0.891, positive predicting value 0.888). A ternary classification was also explored, and lower performance was observed, mainly for the mEF class. A non-AutoML PyTorch implementation in open access confirmed the feasibility of our approach. With this proof of concept, end-to-end video classification based on transfer learning to categorize EF merits consideration for further evaluation in prospective clinical studies.

4.

Prediction of Seropositivity in Suspected Autoimmune Encephalitis by Use of Radiomics: A Radiological Proof-of-Concept Study.

Stake, Jacob; Spiekers, Christine; Akkurt, Burak Han; Heindel, Walter; Brix, Tobias; Mannil, Manoj; Musigmann, Manfred.

Diagnostics (Basel) ; 14(11)2024 May 21.

Artigo em Inglês | MEDLINE | ID: mdl-38893597

RESUMO

In this study, we sought to evaluate the capabilities of radiomics and machine learning in predicting seropositivity in patients with suspected autoimmune encephalitis (AE) from MR images obtained at symptom onset. In 83 patients diagnosed with AE between 2011 and 2022, manual bilateral segmentation of the amygdala was performed on pre-contrast T2 images using 3D Slicer open-source software. Our sample of 83 patients contained 43 seropositive and 40 seronegative AE cases. Images were obtained at our tertiary care center and at various secondary care centers in North Rhine-Westphalia, Germany. The sample was randomly split into training data and independent test data. A total of 107 radiomic features were extracted from bilateral regions of interest (ROIs). Automated machine learning (AutoML) was used to identify the most promising machine learning algorithms. Feature selection was performed using recursive feature elimination (RFE) and based on the determination of the most important features. Selected features were used to train various machine learning algorithms on 100 different data partitions. Performance was subsequently evaluated on independent test data. Our radiomics approach was able to predict the presence of autoantibodies in the independent test samples with a mean AUC of 0.90, a mean accuracy of 0.83, a mean sensitivity of 0.84 and a mean specificity of 0.82, with Lasso regression models yielding the most promising results. These results indicate that radiomics-based machine learning could be a promising tool in predicting the presence of autoantibodies in suspected AE patients. Given the implications of seropositivity for definitive diagnosis of suspected AE cases, this may expedite diagnostic workup even before results from specialized laboratory testing can be obtained. Furthermore, in conjunction with recent publications, our results indicate that characterization of AE subtypes by use of radiomics may become possible in the future, potentially allowing physicians to tailor treatment in the spirit of personalized medicine even before laboratory workup is completed.

5.

Predicting cervical cancer risk probabilities using advanced H20 AutoML and local interpretable model-agnostic explanation techniques.

Prusty, Sashikanta; Patnaik, Srikanta; Dash, Sujit Kumar; Prusty, Sushree Gayatri Priyadarsini; Rautaray, Jyotirmayee; Sahoo, Ghanashyam.

PeerJ Comput Sci ; 10: e1916, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38855252

RESUMO

Background: Cancer is positioned as a major disease, particularly for middle-aged people, which remains a global concern that can develop in the form of abnormal growth of body cells at any place in the human body. Cervical cancer, often known as cervix cancer, is cancer present in the female cervix. In the area where the endocervix (upper two-thirds of the cervix) and ectocervix (lower third of the cervix) meet, the majority of cervical cancers begin. Despite an influx of people entering the healthcare industry, the demand for machine learning (ML) specialists has recently outpaced the supply. To close the gap, user-friendly applications, such as H2O, have made significant progress these days. However, traditional ML techniques handle each stage of the process separately; whereas H2O AutoML can automate a major portion of the ML workflow, such as automatic training and tuning of multiple models within a user-defined timeframe. Methods: Thus, novel H2O AutoML with local interpretable model-agnostic explanations (LIME) techniques have been proposed in this research work that enhance the predictability of an ML model in a user-defined timeframe. We herein collected the cervical cancer dataset from the freely available Kaggle repository for our research work. The Stacked Ensembles approach, on the other hand, will automatically train H2O models to create a highly predictive ensemble model that will outperform the AutoML Leaderboard in most instances. The novelty of this research is aimed at training the best model using the AutoML technique that helps in reducing the human effort over traditional ML techniques in less amount of time. Additionally, LIME has been implemented over the H2O AutoML model, to uncover black boxes and to explain every individual prediction in our model. We have evaluated our model performance using the findprediction() function on three different idx values (i.e., 100, 120, and 150) to find the prediction probabilities of two classes for each feature. These experiments have been done in Lenovo core i7 NVidia GeForce 860M GPU laptop in Windows 10 operating system using Python 3.8.3 software on Jupyter 6.4.3 platform. Results: The proposed model resulted in the prediction probabilities depending on the features as 87%, 95%, and 87% for class '0' and 13%, 5%, and 13% for class '1' when idx_value=100, 120, and 150 for the first case; 100% for class '0' and 0% for class '1', when idx_value= 10, 12, and 15 respectively. Additionally, a comparative analysis has been drawn where our proposed model outperforms previous results found in cervical cancer research.

6.

Design and Implementation of an Intensive Care Unit Command Center for Medical Data Fusion.

Feng, Wen-Sheng; Chen, Wei-Cheng; Lin, Jiun-Yi; Tseng, How-Yang; Chen, Chieh-Lung; Chou, Ching-Yao; Cho, Der-Yang; Lin, Yi-Bing.

Sensors (Basel) ; 24(12)2024 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-38931713

RESUMO

The rapid advancements in Artificial Intelligence of Things (AIoT) are pivotal for the healthcare sector, especially as the world approaches an aging society which will be reached by 2050. This paper presents an innovative AIoT-enabled data fusion system implemented at the CMUH Respiratory Intensive Care Unit (RICU) to address the high incidence of medical errors in ICUs, which are among the top three causes of mortality in healthcare facilities. ICU patients are particularly vulnerable to medical errors due to the complexity of their conditions and the critical nature of their care. We introduce a four-layer AIoT architecture designed to manage and deliver both real-time and non-real-time medical data within the CMUH-RICU. Our system demonstrates the capability to handle 22 TB of medical data annually with an average delay of 1.72 ms and a bandwidth of 65.66 Mbps. Additionally, we ensure the uninterrupted operation of the CMUH-RICU with a three-node streaming cluster (called Kafka), provided a failed node is repaired within 9 h, assuming a one-year node lifespan. A case study is presented where the AI application of acute respiratory distress syndrome (ARDS), leveraging our AIoT data fusion approach, significantly improved the medical diagnosis rate from 52.2% to 93.3% and reduced mortality from 56.5% to 39.5%. The results underscore the potential of AIoT in enhancing patient outcomes and operational efficiency in the ICU setting.

Assuntos

Inteligência Artificial , Unidades de Terapia Intensiva , Humanos , Síndrome do Desconforto Respiratório/terapia

7.

TinyNS: Platform-Aware Neurosymbolic Auto Tiny Machine Learning.

Saha, Swapnil Sayan; Sandha, Sandeep Singh; Aggarwal, Mohit; Wang, Brian; Han, Liying; DE Gortari Briseno, Julian; Srivastava, Mani.

ACM Trans Embed Comput Syst ; 23(3)2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38933471

RESUMO

Machine learning at the extreme edge has enabled a plethora of intelligent, time-critical, and remote applications. However, deploying interpretable artificial intelligence systems that can perform high-level symbolic reasoning and satisfy the underlying system rules and physics within the tight platform resource constraints is challenging. In this paper, we introduce TinyNS, the first platform-aware neurosymbolic architecture search framework for joint optimization of symbolic and neural operators. TinyNS provides recipes and parsers to automatically write microcontroller code for five types of neurosymbolic models, combining the context awareness and integrity of symbolic techniques with the robustness and performance of machine learning models. TinyNS uses a fast, gradient-free, black-box Bayesian optimizer over discontinuous, conditional, numeric, and categorical search spaces to find the best synergy of symbolic code and neural networks within the hardware resource budget. To guarantee deployability, TinyNS talks to the target hardware during the optimization process. We showcase the utility of TinyNS by deploying microcontroller-class neurosymbolic models through several case studies. In all use cases, TinyNS outperforms purely neural or purely symbolic approaches while guaranteeing execution on real hardware.

8.

Performance of Automated Machine Learning in Predicting Outcomes of Pneumatic Retinopexy.

Nisanova, Arina; Yavary, Arefeh; Deaner, Jordan; Ali, Ferhina S; Gogte, Priyanka; Kaplan, Richard; Chen, Kevin C; Nudleman, Eric; Grewal, Dilraj; Gupta, Meenakashi; Wolfe, Jeremy; Klufas, Michael; Yiu, Glenn; Soltani, Iman; Emami-Naeini, Parisa.

Ophthalmol Sci ; 4(5): 100470, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38827487

RESUMO

Purpose: Automated machine learning (AutoML) has emerged as a novel tool for medical professionals lacking coding experience, enabling them to develop predictive models for treatment outcomes. This study evaluated the performance of AutoML tools in developing models predicting the success of pneumatic retinopexy (PR) in treatment of rhegmatogenous retinal detachment (RRD). These models were then compared with custom models created by machine learning (ML) experts. Design: Retrospective multicenter study. Participants: Five hundred and thirty nine consecutive patients with primary RRD that underwent PR by a vitreoretinal fellow at 6 training hospitals between 2002 and 2022. Methods: We used 2 AutoML platforms: MATLAB Classification Learner and Google Cloud AutoML. Additional models were developed by computer scientists. We included patient demographics and baseline characteristics, including lens and macula status, RRD size, number and location of breaks, presence of vitreous hemorrhage and lattice degeneration, and physicians' experience. The dataset was split into a training (n = 483) and test set (n = 56). The training set, with a 2:1 success-to-failure ratio, was used to train the MATLAB models. Because Google Cloud AutoML requires a minimum of 1000 samples, the training set was tripled to create a new set with 1449 datapoints. Additionally, balanced datasets with a 1:1 success-to-failure ratio were created using Python. Main Outcome Measures: Single-procedure anatomic success rate, as predicted by the ML models. F2 scores and area under the receiver operating curve (AUROC) were used as primary metrics to compare models. Results: The best performing AutoML model (F2 score: 0.85; AUROC: 0.90; MATLAB), showed comparable performance to the custom model (0.92, 0.86) when trained on the balanced datasets. However, training the AutoML model with imbalanced data yielded misleadingly high AUROC (0.81) despite low F2-score (0.2) and sensitivity (0.17). Conclusions: We demonstrated the feasibility of using AutoML as an accessible tool for medical professionals to develop models from clinical data. Such models can ultimately aid in the clinical decision-making, contributing to better patient outcomes. However, outcomes can be misleading or unreliable if used naively. Limitations exist, particularly if datasets contain missing variables or are highly imbalanced. Proper model selection and data preprocessing can improve the reliability of AutoML tools. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

9.

Clinical performance of automated machine learning: A systematic review.

Thirunavukarasu, Arun James; Elangovan, Kabilan; Gutierrez, Laura; Hassan, Refaat; Li, Yong; Tan, Ting Fang; Cheng, Haoran; Teo, Zhen Ling; Lim, Gilbert; Ting, Daniel Shu Wei.

Ann Acad Med Singap ; 53(3): 187-207, 2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38920245

RESUMO

Introduction: Automated machine learning (autoML) removes technical and technological barriers to building artificial intelligence models. We aimed to summarise the clinical applications of autoML, assess the capabilities of utilised platforms, evaluate the quality of the evidence trialling autoML, and gauge the performance of autoML platforms relative to conventionally developed models, as well as each other. Method: This review adhered to a prospectively registered protocol (PROSPERO identifier CRD42022344427). The Cochrane Library, Embase, MEDLINE and Scopus were searched from inception to 11 July 2022. Two researchers screened abstracts and full texts, extracted data and conducted quality assessment. Disagreement was resolved through discussion and if required, arbitration by a third researcher. Results: There were 26 distinct autoML platforms featured in 82 studies. Brain and lung disease were the most common fields of study of 22 specialties. AutoML exhibited variable performance: area under the receiver operator characteristic curve (AUCROC) 0.35-1.00, F1-score 0.16-0.99, area under the precision-recall curve (AUPRC) 0.51-1.00. AutoML exhibited the highest AUCROC in 75.6% trials; the highest F1-score in 42.3% trials; and the highest AUPRC in 83.3% trials. In autoML platform comparisons, AutoPrognosis and Amazon Rekognition performed strongest with unstructured and structured data, respectively. Quality of reporting was poor, with a median DECIDE-AI score of 14 of 27. Conclusion: A myriad of autoML platforms have been applied in a variety of clinical contexts. The performance of autoML compares well to bespoke computational and clinical benchmarks. Further work is required to improve the quality of validation studies. AutoML may facilitate a transition to data-centric development, and integration with large language models may enable AI to build itself to fulfil user-defined goals.

Assuntos

Aprendizado de Máquina , Humanos , Pneumopatias/diagnóstico , Curva ROC , Encefalopatias/diagnóstico , Área Sob a Curva

10.

Bridging expertise with machine learning and automated machine learning in clinical medicine.

Lee, Chien-Chang; Park, James Yeongjun; Hsu, Wan-Ting.

Ann Acad Med Singap ; 53(3): 129-131, 2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38920240

Assuntos

Medicina Clínica , Aprendizado de Máquina , Humanos

11.

Enhancing prediction and analysis of UK road traffic accident severity using AI: Integration of machine learning, econometric techniques, and time series forecasting in public health research.

Sufian, Md Abu; Varadarajan, Jayasree; Niu, Mingbo.

Heliyon ; 10(7): e28547, 2024 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-38623197

RESUMO

This research project explored into the intricacies of road traffic accidents severity in the UK, employing a potent combination of machine learning algorithms, econometric techniques, and traditional statistical methods to analyse longitudinal historical data. Our robust analysis framework includes descriptive, inferential, bivariate, multivariate methodologies, correlation analysis: Pearson's and Spearman's Rank Correlation Coefficient, multiple logistic regression models, Multicollinearity Assessment, and Model Validation. In addressing heteroscedasticity or autocorrelation in error terms, we've advanced the precision and reliability of our regression analyses using the Generalized Method of Moments (GMM). Additionally, our application of the Vector Autoregressive (VAR) model and the Autoregressive Integrated Moving Average (ARIMA) models have enabled accurate time series forecasting. With this approach, we've achieved superior predictive accuracy and marked by a Mean Absolute Scaled Error (MASE) of 0.800 and a Mean Error (ME) of -73.80 compared to a naive forecast. The project further extends its machine learning application by creating a random forest classifier model with a precision of 73%, a recall of 78%, and an F1-score of 73%. Building on this, we employed the H2O AutoML process to optimize our model selection, resulting in an XGBoost model that exhibits exceptional predictive power as evidenced by an RMSE of 0.1761205782994506 and MAE of 0.0874235576229789. Factor Analysis was leveraged to identify underlying variables or factors that explain the pattern of correlations within a set of observed variables. Scoring history, a tool to observe the model's performance throughout the training process was incorporated to ensure the highest possible performance of our machine learning models. We also incorporated Explainable AI (XAI) techniques, utilizing the SHAP (Shapley Additive Explanations) model to comprehend the contributing factors to accident severity. Features such as Driver_Home_Area_Type, Longitude, Driver_IMD_Decile, Road_Type, Casualty_Home_Area_Type, and Casualty_IMD_Decile were identified as significant influencers. Our research contributes to the nuanced understanding of traffic accident severity and demonstrates the potential of advanced statistical, econometric, machine learning techniques in informing evidence based interventions and policies for enhancing road safety.

12.

Development and Validation of Multimodal Models to Predict the 30-Day Mortality of ICU Patients Based on Clinical Parameters and Chest X-Rays.

Lin, Jiaxi; Yang, Jin; Yin, Minyue; Tang, Yuxiu; Chen, Liquan; Xu, Chang; Zhu, Shiqi; Gao, Jingwen; Liu, Lu; Liu, Xiaolin; Gu, Chenqi; Huang, Zhou; Wei, Yao; Zhu, Jinzhou.

J Imaging Inform Med ; 37(4): 1312-1322, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38448758

RESUMO

We aimed to develop and validate multimodal ICU patient prognosis models that combine clinical parameters data and chest X-ray (CXR) images. A total of 3798 subjects with clinical parameters and CXR images were extracted from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database and an external hospital (the test set). The primary outcome was 30-day mortality after ICU admission. Automated machine learning (AutoML) and convolutional neural networks (CNNs) were used to construct single-modal models based on clinical parameters and CXR separately. An early fusion approach was used to integrate both modalities (clinical parameters and CXR) into a multimodal model named PrismICU. Compared to the single-modal models, i.e., the clinical parameter model (AUC = 0.80, F1-score = 0.43) and the CXR model (AUC = 0.76, F1-score = 0.45) and the scoring system APACHE II (AUC = 0.83, F1-score = 0.77), PrismICU (AUC = 0.95, F1 score = 0.95) showed improved performance in predicting the 30-day mortality in the validation set. In the test set, PrismICU (AUC = 0.82, F1-score = 0.61) was also better than the clinical parameters model (AUC = 0.72, F1-score = 0.50), CXR model (AUC = 0.71, F1-score = 0.36), and APACHE II (AUC = 0.62, F1-score = 0.50). PrismICU, which integrated clinical parameters data and CXR images, performed better than single-modal models and the existing scoring system. It supports the potential of multimodal models based on structured data and imaging in clinical management.

Assuntos

Unidades de Terapia Intensiva , Radiografia Torácica , Humanos , Masculino , Radiografia Torácica/métodos , Feminino , Pessoa de Meia-Idade , Idoso , Redes Neurais de Computação , Prognóstico , Aprendizado de Máquina , Mortalidade Hospitalar

13.

Urban ozone variability using automated machine learning: inference from different feature importance schemes.

Nath, Sankar Jyoti; Girach, Imran A; Harithasree, S; Bhuyan, Kalyan; Ojha, Narendra; Kumar, Manish.

Environ Monit Assess ; 196(4): 393, 2024 Mar 23.

Artigo em Inglês | MEDLINE | ID: mdl-38520559

RESUMO

Tropospheric ozone is an air pollutant at the ground level and a greenhouse gas which significantly contributes to the global warming. Strong anthropogenic emissions in and around urban environments enhance surface ozone pollution impacting the human health and vegetation adversely. However, observations are often scarce and the factors driving ozone variability remain uncertain in the developing regions of the world. In this regard, here, we conducted machine learning (ML) simulations of ozone variability and comprehensively examined the governing factors over a major urban environment (Ahmedabad) in western India. Ozone precursors (NO2, NO, CO, C5H8 and CH2O) from the CAMS (Copernicus Atmosphere Monitoring Service) reanalysis and meteorological parameters from the ERA5 (European Centre for Medium-Range Weather Forecast's (ECMWF) fifth-generation reanalysis) were included as features in the ML models. Automated ML (AutoML) fitted the deep learning model optimally and simulated the daily ozone with root mean square error (RMSE) of ~2 ppbv reproducing 84-88% of variability. The model performance achieved here is comparable to widely used ML models (RF-Random Forest and XGBoost-eXtreme Gradient Boosting). Explainability of the models is discussed through different schemes of feature importance, including SAGE (Shapley Additive Global importancE) and permutation importance. The leading features are found to be different from different feature importance schemes. We show that urban ozone could be simulated well (RMSE = 2.5 ppbv and R2 = 0.78) by considering first four leading features, from different schemes, which are consistent with ozone photochemistry. Our study underscores the need to conduct science-informed analysis of feature importance from multiple schemes to infer the roles of input variables in ozone variability. AutoML-based studies, exploiting potentials of long-term observations, can strongly complement the conventional chemistry-transport modelling and can also help in accurate simulation and forecast of urban ozone.

Assuntos

Poluentes Atmosféricos , Poluição do Ar , Ozônio , Humanos , Ozônio/análise , Poluição do Ar/análise , Monitoramento Ambiental , Poluentes Atmosféricos/análise , Aprendizado de Máquina

14.

Novel machine learning insights into the QM7b and QM9 quantum mechanics datasets.

Valdés, Julio J; Tchagang, Alain B.

J Comput Chem ; 45(15): 1193-1214, 2024 Jun 05.

Artigo em Inglês | MEDLINE | ID: mdl-38329198

RESUMO

This paper (i) explores the internal structure of two quantum mechanics datasets (QM7b, QM9), composed of several thousands of organic molecules and described in terms of electronic properties, and (ii) further explores an inverse design approach to molecular design consisting of using machine learning methods to approximate the atomic composition of molecules, using QM9 data. Understanding the structure and characteristics of this kind of data is important when predicting the atomic composition from physical-chemical properties in inverse molecular designs. Intrinsic dimension analysis, clustering, and outlier detection methods were used in the study. They revealed that for both datasets the intrinsic dimensionality is several times smaller than the descriptive dimensions. The QM7b data is composed of well-defined clusters related to atomic composition. The QM9 data consists of an outer region predominantly composed of outliers, and an inner, core region that concentrates clustered inliner objects. A significant relationship exists between the number of atoms in the molecule and its outlier/inliner nature. The spatial structure exhibits a relationship with molecular weight. Despite the structural differences between the two datasets, the predictability of variables of interest for inverse molecular design is high. This is exemplified by models estimating the number of atoms of the molecule from both the original properties and from lower dimensional embedding spaces. In the generative approach the input is given by a set of desired properties of the molecule and the output is an approximation of the atomic composition in terms of its constituent chemical elements. This could serve as the starting region for further search in the huge space determined by the set of possible chemical compounds. The quantum mechanic's dataset QM9 is used in the study, composed of 133,885 small organic molecules and 19 electronic properties. Different multi-target regression approaches were considered for predicting the atomic composition from the properties, including feature engineering techniques in an auto-machine learning framework. High-quality models were found that predict the atomic composition of the molecules from their electronic properties, as well as from a subset of only 52.6% size. Feature selection worked better than feature generation. The results validate the generative approach to inverse molecular design.

15.

Predicting preterm birth using auto-ML frameworks: a large observational study using electronic inpatient discharge data.

Kong, Deming; Tao, Ye; Xiao, Haiyan; Xiong, Huini; Wei, Weizhong; Cai, Miao.

Front Pediatr ; 12: 1330420, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38362001

RESUMO

Background: To develop and compare different AutoML frameworks and machine learning models to predict premature birth. Methods: The study used a large electronic medical record database to include 715,962 participants who had the principal diagnosis code of childbirth. Three Automatic Machine Learning (AutoML) were used to construct machine learning models including tree-based models, ensembled models, and deep neural networks on the training sample (N = 536,971). The area under the curve (AUC) and training times were used to assess the performance of the prediction models, and feature importance was computed via permutation-shuffling. Results: The H2O AutoML framework had the highest median AUC of 0.846, followed by AutoGluon (median AUC: 0.840) and Auto-sklearn (median AUC: 0.820), and the median training time was the lowest for H2O AutoML (0.14âmin), followed by AutoGluon (0.16âmin) and Auto-sklearn (4.33âmin). Among different types of machine learning models, the Gradient Boosting Machines (GBM) or Extreme Gradient Boosting (XGBoost), stacked ensemble, and random forrest models had better predictive performance, with median AUC scores being 0.846, 0.846, and 0.842, respectively. Important features related to preterm birth included premature rupture of membrane (PROM), incompetent cervix, occupation, and preeclampsia. Conclusions: Our study highlights the potential of machine learning models in predicting the risk of preterm birth using readily available electronic medical record data, which have significant implications for improving prenatal care and outcomes.

16.

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic.

Kagerbauer, Simone Maria; Ulm, Bernhard; Podtschaske, Armin Horst; Andonov, Dimislav Ivanov; Blobner, Manfred; Jungwirth, Bettina; Graessner, Martin.

BMC Med Inform Decis Mak ; 24(1): 34, 2024 Feb 02.

Artigo em Inglês | MEDLINE | ID: mdl-38308256

RESUMO

BACKGROUND: Concept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift. METHODS: We trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014-2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (Rahmani K, et al, Int J Med Inform 173:104930, 2023) we weighted older data weaker, (Morger A, et al, Sci Rep 12:7244, 2022) used only the most recent data for model training and (Dilmegani C, 2023) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features. RESULTS: The models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters. CONCLUSIONS: Our results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.

Assuntos

COVID-19 , Pandemias , Humanos , COVID-19/epidemiologia , Algoritmos , Mortalidade Hospitalar , Aprendizado de Máquina

17.

High-Throughput Analysis of Leaf Chlorophyll Content in Aquaponically Grown Lettuce Using Hyperspectral Reflectance and RGB Images.

Taha, Mohamed Farag; Mao, Hanping; Wang, Yafei; ElManawy, Ahmed Islam; Elmasry, Gamal; Wu, Letian; Memon, Muhammad Sohail; Niu, Ziang; Huang, Ting; Qiu, Zhengjun.

Plants (Basel) ; 13(3)2024 Jan 29.

Artigo em Inglês | MEDLINE | ID: mdl-38337925

RESUMO

Chlorophyll content reflects plants' photosynthetic capacity, growth stage, and nitrogen status and is, therefore, of significant importance in precision agriculture. This study aims to develop a spectral and color vegetation indices-based model to estimate the chlorophyll content in aquaponically grown lettuce. A completely open-source automated machine learning (AutoML) framework (EvalML) was employed to develop the prediction models. The performance of AutoML along with four other standard machine learning models (back-propagation neural network (BPNN), partial least squares regression (PLSR), random forest (RF), and support vector machine (SVM) was compared. The most sensitive spectral (SVIs) and color vegetation indices (CVIs) for chlorophyll content were extracted and evaluated as reliable estimators of chlorophyll content. Using an ASD FieldSpec 4 Hi-Res spectroradiometer and a portable red, green, and blue (RGB) camera, 3600 hyperspectral reflectance measurements and 800 RGB images were acquired from lettuce grown across a gradient of nutrient levels. Ground measurements of leaf chlorophyll were acquired using an SPAD-502 m calibrated via laboratory chemical analyses. The results revealed a strong relationship between chlorophyll content and SPAD-502 readings, with an R2 of 0.95 and a correlation coefficient (r) of 0.975. The developed AutoML models outperformed all traditional models, yielding the highest values of the coefficient of determination in prediction (Rp2) for all vegetation indices (VIs). The combination of SVIs and CVIs achieved the best prediction accuracy with the highest Rp2 values ranging from 0.89 to 0.98, respectively. This study demonstrated the feasibility of spectral and color vegetation indices as estimators of chlorophyll content. Furthermore, the developed AutoML models can be integrated into embedded devices to control nutrient cycles in aquaponics systems.

18.

Enhancing paranasal sinus disease detection with AutoML: efficient AI development and evaluation via magnetic resonance imaging.

Cheong, Ryan Chin Taw; Jawad, Susan; Adams, Ashok; Campion, Thomas; Lim, Zhe Hong; Papachristou, Nikolaos; Unadkat, Samit; Randhawa, Premjit; Joseph, Jonathan; Andrews, Peter; Taylor, Paul; Kunz, Holger.

Eur Arch Otorhinolaryngol ; 281(4): 2153-2158, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38197934

RESUMO

PURPOSE: Artificial intelligence (AI) in the form of automated machine learning (AutoML) offers a new potential breakthrough to overcome the barrier of entry for non-technically trained physicians. A Clinical Decision Support System (CDSS) for screening purposes using AutoML could be beneficial to ease the clinical burden in the radiological workflow for paranasal sinus diseases. METHODS: The main target of this work was the usage of automated evaluation of model performance and the feasibility of the Vertex AI image classification model on the Google Cloud AutoML platform to be trained to automatically classify the presence or absence of sinonasal disease. The dataset is a consensus labelled Open Access Series of Imaging Studies (OASIS-3) MRI head dataset by three specialised head and neck consultant radiologists. A total of 1313 unique non-TSE T2w MRI head sessions were used from the OASIS-3 repository. RESULTS: The best-performing image classification model achieved a precision of 0.928. Demonstrating the feasibility and high performance of the Vertex AI image classification model to automatically detect the presence or absence of sinonasal disease on MRI. CONCLUSION: AutoML allows for potential deployment to optimise diagnostic radiology workflows and lay the foundation for further AI research in radiology and otolaryngology. The usage of AutoML could serve as a formal requirement for a feasibility study.

Assuntos

Inteligência Artificial , Doenças dos Seios Paranasais , Humanos , Aprendizado de Máquina , Imageamento por Ressonância Magnética , Cabeça , Doenças dos Seios Paranasais/diagnóstico por imagem

19.

Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning-driven data analysis.

Akshay, Akshay; Katoch, Mitali; Shekarchizadeh, Navid; Abedi, Masoud; Sharma, Ankush; Burkhard, Fiona C; Adam, Rosalyn M; Monastyrskaya, Katia; Gheinani, Ali Hashemi.

Gigascience ; 132024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38206587

RESUMO

BACKGROUND: Machine learning (ML) has emerged as a vital asset for researchers to analyze and extract valuable information from complex datasets. However, developing an effective and robust ML pipeline can present a real challenge, demanding considerable time and effort, thereby impeding research progress. Existing tools in this landscape require a profound understanding of ML principles and programming skills. Furthermore, users are required to engage in the comprehensive configuration of their ML pipeline to obtain optimal performance. RESULTS: To address these challenges, we have developed a novel tool called Machine Learning Made Easy (MLme) that streamlines the use of ML in research, specifically focusing on classification problems at present. By integrating 4 essential functionalities-namely, Data Exploration, AutoML, CustomML, and Visualization-MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts. To demonstrate the applicability of MLme, we conducted rigorous testing on 6 distinct datasets, each presenting unique characteristics and challenges. Our results consistently showed promising performance across different datasets, reaffirming the versatility and effectiveness of the tool. Additionally, by utilizing MLme's feature selection functionality, we successfully identified significant markers for CD8+ naive (BACH2), CD16+ (CD16), and CD14+ (VCAN) cell populations. CONCLUSION: MLme serves as a valuable resource for leveraging ML to facilitate insightful data analysis and enhance research outcomes, while alleviating concerns related to complex coding scripts. The source code and a detailed tutorial for MLme are available at https://github.com/FunctionalUrology/MLme.

Assuntos

Análise de Dados , Aprendizado de Máquina , Humanos , Pesquisadores , Software

20.

An autonomous mixed data oversampling method for AIOT-based churn recognition and personalized recommendations using behavioral segmentation.

Fatima, Ghulam; Khan, Salabat; Aadil, Farhan; Kim, Do Hyuen; Atteia, Ghada; Alabdulhafith, Maali.

PeerJ Comput Sci ; 10: e1756, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38196952

RESUMO

The telecom sector is currently undergoing a digital transformation by integrating artificial intelligence (AI) and Internet of Things (IoT) technologies. Customer retention in this context relies on the application of autonomous AI methods for analyzing IoT device data patterns in relation to the offered service packages. One significant challenge in existing studies is treating churn recognition and customer segmentation as separate tasks, which diminishes overall system accuracy. This study introduces an innovative approach by leveraging a unified customer analytics platform that treats churn recognition and segmentation as a bi-level optimization problem. The proposed framework includes an Auto Machine Learning (AutoML) oversampling method, effectively handling three mixed datasets of customer churn features while addressing imbalanced-class distribution issues. To enhance performance, the study utilizes the strength of oversampling methods like synthetic minority oversampling technique for nominal and continuous features (SMOTE-NC) and synthetic minority oversampling with encoded nominal and continuous features (SMOTE-ENC). Performance evaluation, using 10-fold cross-validation, measures accuracy and F1-score. Simulation results demonstrate that the proposed strategy, particularly Random Forest (RF) with SMOTE-NC, outperforms standard methods with SMOTE. It achieves accuracy rates of 79.24%, 94.54%, and 69.57%, and F1-scores of 65.25%, 81.87%, and 45.62% for the IBM, Kaggle Telco and Cell2Cell datasets, respectively. The proposed method autonomously determines the number and density of clusters. Factor analysis employing Bayesian logistic regression identifies influential factors for accurate customer segmentation. Furthermore, the study segments consumers behaviorally and generates targeted recommendations for personalized service packages, benefiting decision-makers.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA