Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 111
Filtrar
1.
Methods Mol Biol ; 2847: 63-93, 2025.
Artigo em Inglês | MEDLINE | ID: mdl-39312137

RESUMO

Machine learning algorithms, and in particular deep learning approaches, have recently garnered attention in the field of molecular biology due to remarkable results. In this chapter, we describe machine learning approaches specifically developed for the design of RNAs, with a focus on the learna_tools Python package, a collection of automated deep reinforcement learning algorithms for secondary structure-based RNA design. We explain the basic concepts of reinforcement learning and its extension, automated reinforcement learning, and outline how these concepts can be successfully applied to the design of RNAs. The chapter is structured to guide through the usage of the different programs with explicit examples, highlighting particular applications of the individual tools.


Assuntos
Algoritmos , Aprendizado de Máquina , Conformação de Ácido Nucleico , RNA , Software , RNA/química , RNA/genética , Biologia Computacional/métodos , Aprendizado Profundo
2.
Chin Med ; 19(1): 127, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39278905

RESUMO

The aim of this study was to develop a machine learning-assisted rapid determination methodology for traditional Chinese Medicine Constitution. Based on the Constitution in Chinese Medicine Questionnaire (CCMQ), the most applied diagnostic instrument for assessing individuals' constitutions, we employed automated supervised machine learning algorithms (i.e., Tree-based Pipeline Optimization Tool; TPOT) on all the possible item combinations for each subscale and an unsupervised machine learning algorithm (i.e., variable clustering; varclus) on the whole scale to select items that can best predict body constitution (BC) classifications or BC scores. By utilizing subsets of items selected based on TPOT and corresponding machine learning algorithms, the accuracies of BC classifications prediction ranged from 0.819 to 0.936, with the root mean square errors of BC scores prediction stabilizing between 6.241 and 9.877. Overall, the results suggested that the automated machine learning algorithms performed better than the varclus algorithm for item selection. Additionally, based on an automated machine learning item selection procedure, we provided the top three ranked item combinations with each possible subscale length, along with their corresponding algorithms for predicting BC classification and severity. This approach could accommodate the needs of different practitioners in traditional Chinese medicine for rapid constitution determination.

3.
J Minim Invasive Surg ; 27(3): 129-137, 2024 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-39300720

RESUMO

Recently, interest in machine learning (ML) has increased as the application fields have expanded significantly. Although ML methods excel in many fields, establishing an ML pipeline requires considerable time and human resources. Automated ML (AutoML) tools offer a solution by automating repetitive tasks, such as data preprocessing, model selection, hyperparameter optimization, and prediction analysis. This review introduces the use of AutoML tools for general research, including clinical studies. In particular, it outlines a simple approach that is accessible to beginners using the R programming language (R Foundation for Statistical Computing). In addition, the practical code and output results for binary classification are provided to facilitate direct application by clinical researchers in future studies.

4.
Sci Rep ; 14(1): 22658, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39349512

RESUMO

This study evaluates the diagnostic efficacy of automated machine learning (AutoGluon) with automated feature engineering and selection (autofeat), focusing on clinical manifestations, and a model integrating both clinical manifestations and CT findings in adult patients with ambiguous computed tomography (CT) results for acute appendicitis (AA). This evaluation was compared with conventional single machine learning models such as logistic regression(LR) and established scoring systems such as the Adult Appendicitis Score(AAS) to address the gap in diagnostic approaches for uncertain AA cases. In this retrospective analysis of 303 adult patients with indeterminate CT findings, the cohort was divided into appendicitis (n = 115) and non-appendicitis (n = 188) groups. AutoGluon and autofeat were used for AA prediction. The AutoGluon-clinical model relied solely on clinical data, whereas the AutoGluon-clinical-CT model included both clinical and CT data. The area under the receiver operating characteristic curve (AUROC) and other metrics for the test dataset, namely accuracy, sensitivity, specificity, PPV, NPV, and F1 score, were used to compare AutoGluon models with single machine learning models and the AAS. The single ML models in this study were LR, LASSO regression, ridge regression, support vector machine, decision tree, random forest, and extreme gradient boosting. Feature importance values were extracted using the "feature_importance" attribute from AutoGluon. The AutoGluon-clinical model demonstrated an AUROC of 0.785 (95% CI 0.691-0.890), and the ridge regression model with only clinical data revealed an AUROC of 0.755 (95% CI 0.649-0.861). The AutoGluon-clinical-CT model (AUROC 0.886 with 95% CI 0.820-0.951) performed better than the ridge model using clinical and CT data (AUROC 0.852 with 95% CI 0.774-0.930, p = 0.029). A new feature, exp(-(duration from pain to CT)3 + rebound tenderness), was identified (importance = 0.049, p = 0.001). AutoML (AutoGluon) and autoFE (autofeat) enhanced the diagnosis of uncertain AA cases, particularly when combining CT and clinical findings. This study suggests the potential of integrating AutoML and autoFE in clinical settings to improve diagnostic strategies and patient outcomes and make more efficient use of healthcare resources. Moreover, this research supports further exploration of machine learning in diagnostic processes.


Assuntos
Apendicite , Aprendizado de Máquina , Tomografia Computadorizada por Raios X , Humanos , Apendicite/diagnóstico por imagem , Apendicite/diagnóstico , Masculino , Tomografia Computadorizada por Raios X/métodos , Feminino , Adulto , Estudos Retrospectivos , Pessoa de Meia-Idade , Curva ROC
5.
ESMO Open ; 9(8): 103595, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39088983

RESUMO

BACKGROUND: Early screening using low-dose computed tomography (LDCT) can reduce mortality caused by non-small-cell lung cancer. However, ∼25% of the 'suspicious' pulmonary nodules identified by LDCT are later confirmed benign through resection surgery, adding to patients' discomfort and the burden on the healthcare system. In this study, we aim to develop a noninvasive liquid biopsy assay for distinguishing pulmonary malignancy from benign yet 'suspicious' lung nodules using cell-free DNA (cfDNA) fragmentomics profiling. METHODS: An independent training cohort consisting of 193 patients with malignant nodules and 44 patients with benign nodules was used to construct a machine learning model. Base models using four different fragmentomics profiles were optimized using an automated machine learning approach before being stacked into the final predictive model. An independent validation cohort, including 96 malignant nodules and 22 benign nodules, and an external test cohort, including 58 malignant nodules and 41 benign nodules, were used to assess the performance of the stacked ensemble model. RESULTS: Our machine learning models demonstrated excellent performance in detecting patients with malignant nodules. The area under the curves reached 0.857 and 0.860 in the independent validation cohort and the external test cohort, respectively. The validation cohort achieved an excellent specificity (68.2%) at the targeted 90% sensitivity (89.6%). An equivalently good performance was observed while applying the cut-off to the external cohort, which reached a specificity of 63.4% at 89.7% sensitivity. A subgroup analysis for the independent validation cohort showed that the sensitivities for detecting various subgroups of nodule size (<1 cm: 91.7%; 1-3 cm: 88.1%; >3 cm: 100%; unknown: 100%) and smoking history (yes: 88.2%; no: 89.9%) all remained high among the lung cancer group. CONCLUSIONS: Our cfDNA fragmentomics assay can provide a noninvasive approach to distinguishing malignant nodules from radiographically suspicious but pathologically benign ones, amending LDCT false positives.


Assuntos
Ácidos Nucleicos Livres , Neoplasias Pulmonares , Aprendizado de Máquina , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/patologia , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Nódulos Pulmonares Múltiplos/diagnóstico por imagem , Biópsia Líquida/métodos , Detecção Precoce de Câncer/métodos , Tomografia Computadorizada por Raios X/métodos , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Carcinoma Pulmonar de Células não Pequenas/diagnóstico
6.
J Hazard Mater ; 478: 135555, 2024 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-39186842

RESUMO

The accumulation of polyethylene microplastic (PE-MPs) in soil can significantly impact plant quality and yield, as well as affect human health and food chain cycles. Therefore, developing rapid and effective detection methods is crucial. In this study, traditional machine learning (ML) and H2O automated machine learning (H2O AutoML) were utilized to offer a powerful framework for detecting PE-MPs (0.1 %, 1 %, and 2 % by dry soil weight) and the co-contamination of PE-MPs and fomesafen (a common herbicide) in soil. The development of the framework was based on the results of the metabolic reprogramming of soybean plants. Our study stated that traditional ML exhibits lower accuracy due to the challenges associated with optimizing complex parameters. H2O AutoML can accurately distinguish between clean soil and contaminated soil. Notably, H2O AutoML can detect PE-MPs as low as 0.1 % (with 100 % accuracy) and co-contamination of PE-MPs and fomesafen (with 90 % accuracy) in soil. The VIP and SHAP analyses of the H2O AutoML showed that PE-MPs and the co-contamination of PE-MPs and fomesafen significantly interfered with the antioxidant system and energy regulation of soybean. We hope this study can provide a reliable scientific basis for sustainable development of the environment.


Assuntos
Glycine max , Aprendizado de Máquina , Microplásticos , Poluentes do Solo , Glycine max/metabolismo , Glycine max/efeitos dos fármacos , Poluentes do Solo/análise , Poluentes do Solo/metabolismo , Microplásticos/toxicidade , Microplásticos/análise , Herbicidas/análise , Monitoramento Ambiental/métodos , Polietileno , Solo/química , Reprogramação Metabólica
7.
Digit Health ; 10: 20552076241272535, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39119551

RESUMO

Background: Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database. Methods: All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot. Results: A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set. Conclusions: We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.

8.
Front Cardiovasc Med ; 11: 1360548, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39011494

RESUMO

Objective: This study focuses on the innovative application of Automated Machine Learning (AutoML) technology in cardiovascular medicine to construct an explainable Coronary Artery Disease (CAD) prediction model to support the clinical diagnosis of CAD. Methods: This study utilizes a combined data set of five public data sets related to CAD. An ensemble model is constructed using the AutoML open-source framework AutoGluon to evaluate the feasibility of AutoML in constructing a disease prediction model in cardiovascular medicine. The performance of the ensemble model is compared against individual baseline models. Finally, the disease prediction ensemble model is explained using SHapley Additive exPlanations (SHAP). Results: The experimental results show that the AutoGluon-based ensemble model performs better than the individual baseline models in predicting CAD. It achieved an accuracy of 0.9167 and an AUC of 0.9562 in 4-fold cross-bagging. SHAP measures the importance of each feature to the prediction of the model and explains the prediction results of the model. Conclusion: This study demonstrates the feasibility and efficacy of AutoML technology in cardiovascular medicine and highlights its potential in disease prediction. AutoML reduces the barriers to model building and significantly improves prediction accuracy. Additionally, the integration of SHAP enhances model transparency and explainability, which is critical to ensuring model credibility and widespread adoption in cardiovascular medicine.

9.
Natl Sci Rev ; 11(8): nwad292, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39007004

RESUMO

Formulating the methodology of machine learning by bilevel optimization techniques provides a new perspective to understand and solve automated machine learning problems.

10.
Natl Sci Rev ; 11(8): nwad300, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39007001

RESUMO

Dive into the novel OpenML paradigm, unveiling its transformative approach to robust AI in dynamic environment, shaping Automated Machine Learning with adaptability for ground breaking advancements towards Artificial General Intelligence.

11.
Diagnostics (Basel) ; 14(11)2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38893597

RESUMO

In this study, we sought to evaluate the capabilities of radiomics and machine learning in predicting seropositivity in patients with suspected autoimmune encephalitis (AE) from MR images obtained at symptom onset. In 83 patients diagnosed with AE between 2011 and 2022, manual bilateral segmentation of the amygdala was performed on pre-contrast T2 images using 3D Slicer open-source software. Our sample of 83 patients contained 43 seropositive and 40 seronegative AE cases. Images were obtained at our tertiary care center and at various secondary care centers in North Rhine-Westphalia, Germany. The sample was randomly split into training data and independent test data. A total of 107 radiomic features were extracted from bilateral regions of interest (ROIs). Automated machine learning (AutoML) was used to identify the most promising machine learning algorithms. Feature selection was performed using recursive feature elimination (RFE) and based on the determination of the most important features. Selected features were used to train various machine learning algorithms on 100 different data partitions. Performance was subsequently evaluated on independent test data. Our radiomics approach was able to predict the presence of autoantibodies in the independent test samples with a mean AUC of 0.90, a mean accuracy of 0.83, a mean sensitivity of 0.84 and a mean specificity of 0.82, with Lasso regression models yielding the most promising results. These results indicate that radiomics-based machine learning could be a promising tool in predicting the presence of autoantibodies in suspected AE patients. Given the implications of seropositivity for definitive diagnosis of suspected AE cases, this may expedite diagnostic workup even before results from specialized laboratory testing can be obtained. Furthermore, in conjunction with recent publications, our results indicate that characterization of AE subtypes by use of radiomics may become possible in the future, potentially allowing physicians to tailor treatment in the spirit of personalized medicine even before laboratory workup is completed.

12.
Ann Acad Med Singap ; 53(3): 187-207, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38920245

RESUMO

Introduction: Automated machine learning (autoML) removes technical and technological barriers to building artificial intelligence models. We aimed to summarise the clinical applications of autoML, assess the capabilities of utilised platforms, evaluate the quality of the evidence trialling autoML, and gauge the performance of autoML platforms relative to conventionally developed models, as well as each other. Method: This review adhered to a prospectively registered protocol (PROSPERO identifier CRD42022344427). The Cochrane Library, Embase, MEDLINE and Scopus were searched from inception to 11 July 2022. Two researchers screened abstracts and full texts, extracted data and conducted quality assessment. Disagreement was resolved through discussion and if required, arbitration by a third researcher. Results: There were 26 distinct autoML platforms featured in 82 studies. Brain and lung disease were the most common fields of study of 22 specialties. AutoML exhibited variable performance: area under the receiver operator characteristic curve (AUCROC) 0.35-1.00, F1-score 0.16-0.99, area under the precision-recall curve (AUPRC) 0.51-1.00. AutoML exhibited the highest AUCROC in 75.6% trials; the highest F1-score in 42.3% trials; and the highest AUPRC in 83.3% trials. In autoML platform comparisons, AutoPrognosis and Amazon Rekognition performed strongest with unstructured and structured data, respectively. Quality of reporting was poor, with a median DECIDE-AI score of 14 of 27. Conclusion: A myriad of autoML platforms have been applied in a variety of clinical contexts. The performance of autoML compares well to bespoke computational and clinical benchmarks. Further work is required to improve the quality of validation studies. AutoML may facilitate a transition to data-centric development, and integration with large language models may enable AI to build itself to fulfil user-defined goals.


Assuntos
Aprendizado de Máquina , Humanos , Pneumopatias/diagnóstico , Curva ROC , Encefalopatias/diagnóstico , Área Sob a Curva
14.
Sensors (Basel) ; 24(12)2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38931713

RESUMO

The rapid advancements in Artificial Intelligence of Things (AIoT) are pivotal for the healthcare sector, especially as the world approaches an aging society which will be reached by 2050. This paper presents an innovative AIoT-enabled data fusion system implemented at the CMUH Respiratory Intensive Care Unit (RICU) to address the high incidence of medical errors in ICUs, which are among the top three causes of mortality in healthcare facilities. ICU patients are particularly vulnerable to medical errors due to the complexity of their conditions and the critical nature of their care. We introduce a four-layer AIoT architecture designed to manage and deliver both real-time and non-real-time medical data within the CMUH-RICU. Our system demonstrates the capability to handle 22 TB of medical data annually with an average delay of 1.72 ms and a bandwidth of 65.66 Mbps. Additionally, we ensure the uninterrupted operation of the CMUH-RICU with a three-node streaming cluster (called Kafka), provided a failed node is repaired within 9 h, assuming a one-year node lifespan. A case study is presented where the AI application of acute respiratory distress syndrome (ARDS), leveraging our AIoT data fusion approach, significantly improved the medical diagnosis rate from 52.2% to 93.3% and reduced mortality from 56.5% to 39.5%. The results underscore the potential of AIoT in enhancing patient outcomes and operational efficiency in the ICU setting.


Assuntos
Inteligência Artificial , Unidades de Terapia Intensiva , Humanos , Síndrome do Desconforto Respiratório/terapia
15.
J Mol Biol ; : 168653, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38871176

RESUMO

Meiotic recombination plays a pivotal role in genetic evolution. Genetic variation induced by recombination is a crucial factor in generating biodiversity and a driving force for evolution. At present, the development of recombination hotspot prediction methods has encountered challenges related to insufficient feature extraction and limited generalization capabilities. This paper focused on the research of recombination hotspot prediction methods. We explored deep learning-based recombination hotspot prediction and scrutinized the shortcomings of prevalent models in addressing the challenge of recombination hotspot prediction. To addressing these deficiencies, an automated machine learning approach was utilized to construct recombination hotspot prediction model. The model combined sequence information with physicochemical properties by employing TF-IDF-Kmer and DNA composition components to acquire more effective feature data. Experimental results validate the effectiveness of the feature extraction method and automated machine learning technology used in this study. The final model was validated on three distinct datasets and yielded accuracy rates of 97.14%, 79.71%, and 98.73%, surpassing the current leading models by 2%, 2.56%, and 4%, respectively. In addition, we incorporated tools such as SHAP and AutoGluon to analyze the interpretability of black-box models, delved into the impact of individual features on the results, and investigated the reasons behind misclassification of samples. Finally, an application of recombination hotspot prediction was established to facilitate easy access to necessary information and tools for researchers. The research outcomes of this paper underscore the enormous potential of automated machine learning methods in gene sequence prediction.

16.
Genome Biol ; 25(1): 159, 2024 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-38886757

RESUMO

BACKGROUND: The advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset? RESULTS: Here, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful. CONCLUSIONS: Supervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.


Assuntos
RNA-Seq , Análise da Expressão Gênica de Célula Única , Animais , Humanos , Análise por Conglomerados , Biologia Computacional/métodos , Aprendizado de Máquina , RNA-Seq/métodos , Análise de Sequência de RNA/métodos , Aprendizado de Máquina Supervisionado
17.
Ophthalmol Sci ; 4(5): 100470, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38827487

RESUMO

Purpose: Automated machine learning (AutoML) has emerged as a novel tool for medical professionals lacking coding experience, enabling them to develop predictive models for treatment outcomes. This study evaluated the performance of AutoML tools in developing models predicting the success of pneumatic retinopexy (PR) in treatment of rhegmatogenous retinal detachment (RRD). These models were then compared with custom models created by machine learning (ML) experts. Design: Retrospective multicenter study. Participants: Five hundred and thirty nine consecutive patients with primary RRD that underwent PR by a vitreoretinal fellow at 6 training hospitals between 2002 and 2022. Methods: We used 2 AutoML platforms: MATLAB Classification Learner and Google Cloud AutoML. Additional models were developed by computer scientists. We included patient demographics and baseline characteristics, including lens and macula status, RRD size, number and location of breaks, presence of vitreous hemorrhage and lattice degeneration, and physicians' experience. The dataset was split into a training (n = 483) and test set (n = 56). The training set, with a 2:1 success-to-failure ratio, was used to train the MATLAB models. Because Google Cloud AutoML requires a minimum of 1000 samples, the training set was tripled to create a new set with 1449 datapoints. Additionally, balanced datasets with a 1:1 success-to-failure ratio were created using Python. Main Outcome Measures: Single-procedure anatomic success rate, as predicted by the ML models. F2 scores and area under the receiver operating curve (AUROC) were used as primary metrics to compare models. Results: The best performing AutoML model (F2 score: 0.85; AUROC: 0.90; MATLAB), showed comparable performance to the custom model (0.92, 0.86) when trained on the balanced datasets. However, training the AutoML model with imbalanced data yielded misleadingly high AUROC (0.81) despite low F2-score (0.2) and sensitivity (0.17). Conclusions: We demonstrated the feasibility of using AutoML as an accessible tool for medical professionals to develop models from clinical data. Such models can ultimately aid in the clinical decision-making, contributing to better patient outcomes. However, outcomes can be misleading or unreliable if used naively. Limitations exist, particularly if datasets contain missing variables or are highly imbalanced. Proper model selection and data preprocessing can improve the reliability of AutoML tools. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

18.
JMIR Form Res ; 8: e55855, 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38738977

RESUMO

BACKGROUND: Psoriasis vulgaris (PsV) and psoriatic arthritis (PsA) are complex, multifactorial diseases significantly impacting health and quality of life. Predicting treatment response and disease progression is crucial for optimizing therapeutic interventions, yet challenging. Automated machine learning (AutoML) technology shows promise for rapidly creating accurate predictive models based on patient features and treatment data. OBJECTIVE: This study aims to develop highly accurate machine learning (ML) models using AutoML to address key clinical questions for PsV and PsA patients, including predicting therapy changes, identifying reasons for therapy changes, and factors influencing skin lesion progression or an abnormal Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) score. METHODS: Clinical study data from 309 PsV and PsA patients were extensively prepared and analyzed using AutoML to build and select the most accurate predictive models for each variable of interest. RESULTS: Therapy change at 24 weeks follow-up was modeled using the extreme gradient boosted trees classifier with early stopping (area under the receiver operating characteristic curve [AUC] of 0.9078 and logarithmic loss [LogLoss] of 0.3955 for the holdout partition). Key influencing factors included the initial systemic therapeutic agent, the Classification Criteria for Psoriatic Arthritis score at baseline, and changes in quality of life. An average blender incorporating three models (gradient boosted trees classifier, ExtraTrees classifier, and Eureqa generalized additive model classifier) with an AUC of 0.8750 and LogLoss of 0.4603 was used to predict therapy changes for 2 hypothetical patients, highlighting the significance of these factors. Treatments such as methotrexate or specific biologicals showed a lower propensity for change. An average blender of a random forest classifier, an extreme gradient boosted trees classifier, and a Eureqa classifier (AUC of 0.9241 and LogLoss of 0.4498) was used to estimate PASI (Psoriasis Area and Severity Index) change after 24 weeks. Primary predictors included the initial PASI score, change in pruritus levels, and change in therapy. A lower initial PASI score and consistently low pruritus were associated with better outcomes. BASDAI classification at onset was analyzed using an average blender of a Eureqa generalized additive model classifier, an extreme gradient boosted trees classifier with early stopping, and a dropout additive regression trees classifier with an AUC of 0.8274 and LogLoss of 0.5037. Influential factors included initial pain, disease activity, and Hospital Anxiety and Depression Scale scores for depression and anxiety. Increased pain, disease activity, and psychological distress generally led to higher BASDAI scores. CONCLUSIONS: The practical implications of these models for clinical decision-making in PsV and PsA can guide early investigation and treatment, contributing to improved patient outcomes.

19.
Sci Rep ; 14(1): 12415, 2024 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-38816560

RESUMO

Gastrointestinal stromal tumors (GISTs) are a rare type of tumor that can develop liver metastasis (LIM), significantly impacting the patient's prognosis. This study aimed to predict LIM in GIST patients by constructing machine learning (ML) algorithms to assist clinicians in the decision-making process for treatment. Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases from 2010 to 2015 were assigned to the developing sets, while cases from 2016 to 2017 were assigned to the testing set. Missing values were addressed using the multiple imputation technique. Four algorithms were utilized to construct the models, comprising traditional logistic regression (LR) and automated machine learning (AutoML) analysis such as gradient boost machine (GBM), deep neural net (DL), and generalized linear model (GLM). We evaluated the models' performance using LR-based metrics, including the area under the receiver operating characteristic curve (AUC), calibration curve, and decision curve analysis (DCA), as well as AutoML-based metrics, such as feature importance, SHapley Additive exPlanation (SHAP) Plots, and Local Interpretable Model Agnostic Explanation (LIME). A total of 6207 patients were included in this study, with 2683, 1780, and 1744 patients allocated to the training, validation, and test sets, respectively. Among the different models evaluated, the GBM model demonstrated the highest performance in the training, validation, and test cohorts, with respective AUC values of 0.805, 0.780, and 0.795. Furthermore, the GBM model outperformed other AutoML models in terms of accuracy, achieving 0.747, 0.700, and 0.706 in the training, validation, and test cohorts, respectively. Additionally, the study revealed that tumor size and tumor location were the most significant predictors influencing the AutoML model's ability to accurately predict LIM. The AutoML model utilizing the GBM algorithm for GIST patients can effectively predict the risk of LIM and provide clinicians with a reference for developing individualized treatment plans.


Assuntos
Tumores do Estroma Gastrointestinal , Neoplasias Hepáticas , Aprendizado de Máquina , Programa de SEER , Humanos , Tumores do Estroma Gastrointestinal/patologia , Neoplasias Hepáticas/secundário , Masculino , Feminino , Pessoa de Meia-Idade , Estudos Retrospectivos , Idoso , Prognóstico , Adulto , Algoritmos , Curva ROC , Neoplasias Gastrointestinais/patologia
20.
J Am Soc Mass Spectrom ; 35(6): 1089-1100, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38690775

RESUMO

Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for nonexperts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. We tested our approach on two data sets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using Auto-sklearn, surpassed standalone ML algorithms like SVM and k-Nearest Neighbors in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers. The effectiveness of Auto-sklearn is highlighted by its AUC scores of 0.97 for RCC and 0.85 for OC, obtained from the unseen test sets. Importantly, on most of the metrics considered, Auto-sklearn demonstrated a better classification performance, leveraging a mix of algorithms and ensemble techniques. Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.


Assuntos
Neoplasias Renais , Aprendizado de Máquina , Metabolômica , Neoplasias Ovarianas , Humanos , Metabolômica/métodos , Feminino , Neoplasias Ovarianas/metabolismo , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/sangue , Neoplasias Renais/metabolismo , Neoplasias Renais/diagnóstico , Neoplasias Renais/sangue , Neoplasias Renais/urina , Algoritmos , Carcinoma de Células Renais/metabolismo , Carcinoma de Células Renais/diagnóstico , Biomarcadores Tumorais/sangue , Biomarcadores Tumorais/análise , Biomarcadores Tumorais/urina , Biomarcadores Tumorais/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA