RESUMO
Cervical cancer is a cancer that remains a significant global health challenge all over the world. Due to improper screening in the early stages, and healthcare disparities, a large number of women are suffering from this disease, and the mortality rate increases day by day. Hence, in these studies, we presented a precise approach utilizing six different machine learning models (decision tree, logistic regression, naïve bayes, random forest, k nearest neighbors, support vector machine), which can predict the early stage of cervical cancer by analysing 36 risk factor attributes of 858 individuals. In addition, two data balancing techniques-Synthetic Minority Oversampling Technique and Adaptive Synthetic Sampling-were used to mitigate the data imbalance issues. Furthermore, Chi-square and Least Absolute Shrinkage and Selection Operator are two distinct feature selection processes that have been applied to evaluate the feature rank, which are mostly correlated to identify the particular disease, and also integrate an explainable artificial intelligence technique, namely Shapley Additive Explanations, for clarifying the model outcome. The applied machine learning model outcome is evaluated by performance evaluation matrices, namely accuracy, sensitivity, specificity, precision, f1-score, false-positive rate and false-negative rate, and area under the Receiver operating characteristic curve score. The decision tree outperformed in Chi-square feature selection with outstanding accuracy with 97.60%, 98.73% sensitivity, 80% specificity, and 98.73% precision, respectively. During the data imbalance, DT performed 97% accuracy, 99.35% sensitivity, 69.23% specificity, and 97.45% precision. This research is focused on developing diagnostic frameworks with automated tools to improve the detection and management of cervical cancer, as well as on helping healthcare professionals deliver more efficient and personalized care to their patients.
RESUMO
Timely diagnosis of brain tumors using MRI and its potential impact on patient survival are critical issues addressed in this study. Traditional DL models often lack transparency, leading to skepticism among medical experts owing to their "black box" nature. This study addresses this gap by presenting an innovative approach for brain tumor detection. It utilizes a customized Convolutional Neural Network (CNN) model empowered by three advanced explainable artificial intelligence (XAI) techniques: Shapley Additive Explana-tions (SHAP), Local Interpretable Model-agnostic Explanations (LIME), and Gradient-weighted Class Activation Mapping (Grad-CAM). The study utilized the BR35H dataset, which includes 3060 brain MRI images encompassing both tumorous and non-tumorous cases. The proposed model achieved a remarkable training accuracy of 100 % and validation accuracy of 98.67 %. Precision, recall, and F1 score metrics demonstrated exceptional performance at 98.50 %, confirming the accuracy of the model in tumor detection. Detailed result analysis, including a confusion matrix, comparison with existing models, and generalizability tests on other datasets, establishes the superiority of the proposed approach and sets a new benchmark for accuracy. By integrating a customized CNN model with XAI techniques, this research enhances trust in AI-driven medical diagnostics and offers a promising pathway for early tumor detection and potentially life-saving interventions.
RESUMO
Explainable artificial intelligence (XAI) aims to offer machine learning (ML) methods that enable people to comprehend, properly trust, and create more explainable models. In medical imaging, XAI has been adopted to interpret deep learning black box models to demonstrate the trustworthiness of machine decisions and predictions. In this work, we proposed a deep learning and explainable AI-based framework for segmenting and classifying brain tumors. The proposed framework consists of two parts. The first part, encoder-decoder-based DeepLabv3+ architecture, is implemented with Bayesian Optimization (BO) based hyperparameter initialization. The different scales are performed, and features are extracted through the Atrous Spatial Pyramid Pooling (ASPP) technique. The extracted features are passed to the output layer for tumor segmentation. In the second part of the proposed framework, two customized models have been proposed named Inverted Residual Bottleneck 96 layers (IRB-96) and Inverted Residual Bottleneck Self-Attention (IRB-Self). Both models are trained on the selected brain tumor datasets and extracted features from the global average pooling and self-attention layers. Features are fused using a serial approach, and classification is performed. The BO-based hyperparameters optimization of the neural network classifiers is performed and the classification results have been optimized. An XAI method named LIME is implemented to check the interpretability of the proposed models. The experimental process of the proposed framework was performed on the Figshare dataset, and an average segmentation accuracy of 92.68 % and classification accuracy of 95.42 % were obtained, respectively. Compared with state-of-the-art techniques, the proposed framework shows improved accuracy.
RESUMO
BACKGROUND/OBJECTIVES: Pancreatic cyst management can be distilled into three separate pathways - discharge, monitoring or surgery- based on the risk of malignant transformation. This study compares the performance of artificial intelligence (AI) models to clinical care for this task. METHODS: Two explainable boosting machine (EBM) models were developed and evaluated using clinical features only, or clinical features and cyst fluid molecular markers (CFMM) using a publicly available dataset, consisting of 850 cases (median age 64; 65 % female) with independent training (429 cases) and holdout test cohorts (421 cases). There were 137 cysts with no malignant potential, 114 malignant cysts, and 599 IPMNs and MCNs. RESULTS: The EBM and EBM with CFMM models had higher accuracy for identifying patients requiring monitoring (0.88 and 0.82) and surgery (0.66 and 0.82) respectively compared with current clinical care (0.62 and 0.58). For discharge, the EBM with CFMM model had a higher accuracy (0.91) than either the EBM model (0.84) or current clinical care (0.86). In the cohort of patients who underwent surgical resection, use of the EBM-CFMM model would have decreased the number of unnecessary surgeries by 59 % (n = 92), increased correct surgeries by 7.5 % (n = 11), identified patients who require monitoring by 122 % (n = 76), and increased the number of patients correctly classified for discharge by 138 % (n = 18) compared to clinical care. CONCLUSIONS: EBM models had greater sensitivity and specificity for identifying the correct management compared with either clinical management or previous AI models. The model predictions are demonstrated to be interpretable by clinicians.
RESUMO
As the global incidence of cancer continues to rise rapidly, the need for swift and precise diagnoses has become increasingly pressing. Pathologists commonly rely on H&E-panCK stain pairs for various aspects of cancer diagnosis, including the detection of occult tumor cells and the evaluation of tumor budding. Nevertheless, conventional chemical staining methods suffer from notable drawbacks, such as time-intensive processes and irreversible staining outcomes. The virtual stain technique, leveraging generative adversarial network (GAN), has emerged as a promising alternative to chemical stains. This approach aims to transform biopsy scans (often H&E) into other stain types. Despite achieving notable progress in recent years, current state-of-the-art virtual staining models confront challenges that hinder their efficacy, particularly in achieving accurate staining outcomes under specific conditions. These limitations have impeded the practical integration of virtual staining into diagnostic practices. To address the goal of producing virtual panCK stains capable of replacing chemical panCK, we propose an innovative multi-model framework. Our approach involves employing a combination of Mask-RCNN (for cell segmentation) and GAN models to extract cytokeratin distribution from chemical H&E images. Additionally, we introduce a tailored dynamic GAN model to convert H&E images into virtual panCK stains, integrating the derived cytokeratin distribution. Our framework is motivated by the fact that the unique pattern of the panCK is derived from cytokeratin distribution. As a proof of concept, we employ our virtual panCK stains to evaluate tumor budding in 45 H&E whole-slide images taken from breast cancer-invaded lymph nodes . Through thorough validation by both pathologists and the QuPath software, our virtual panCK stains demonstrate a remarkable level of accuracy. In stark contrast, the accuracy of state-of-the-art single cycleGAN virtual panCK stains is negligible. To our best knowledge, this is the first instance of a multi-model virtual panCK framework and the utilization of virtual panCK for tumor budding assessment. Our framework excels in generating dependable virtual panCK stains with significantly improved efficiency, thereby considerably reducing turnaround times in diagnosis. Furthermore, its outcomes are easily comprehensible even to pathologists who may not be well-versed in computer technology. We firmly believe that our framework has the potential to advance the field of virtual stain, thereby making significant strides towards improved cancer diagnosis.
RESUMO
This review article offers a comprehensive analysis of current developments in the application of machine learning for cancer diagnostic systems. The effectiveness of machine learning approaches has become evident in improving the accuracy and speed of cancer detection, addressing the complexities of large and intricate medical datasets. This review aims to evaluate modern machine learning techniques employed in cancer diagnostics, covering various algorithms, including supervised and unsupervised learning, as well as deep learning and federated learning methodologies. Data acquisition and preprocessing methods for different types of data, such as imaging, genomics, and clinical records, are discussed. The paper also examines feature extraction and selection techniques specific to cancer diagnosis. Model training, evaluation metrics, and performance comparison methods are explored. Additionally, the review provides insights into the applications of machine learning in various cancer types and discusses challenges related to dataset limitations, model interpretability, multi-omics integration, and ethical considerations. The emerging field of explainable artificial intelligence (XAI) in cancer diagnosis is highlighted, emphasizing specific XAI techniques proposed to improve cancer diagnostics. These techniques include interactive visualization of model decisions and feature importance analysis tailored for enhanced clinical interpretation, aiming to enhance both diagnostic accuracy and transparency in medical decision-making. The paper concludes by outlining future directions, including personalized medicine, federated learning, deep learning advancements, and ethical considerations. This review aims to guide researchers, clinicians, and policymakers in the development of efficient and interpretable machine learning-based cancer diagnostic systems.
RESUMO
INTRODUCTION: Survival analysis based on cancer registry data is of paramount importance for monitoring the effectiveness of health care. As new methods arise, the compendium of statistical tools applicable to cancer registry data grows. In recent years, machine learning approaches for survival analysis were developed. The aim of this study is to compare the model performance of the well established Cox regression and novel machine learning approaches on a previously unused dataset. MATERIAL AND METHODS: The study is based on lung cancer data from the Schleswig-Holstein Cancer Registry. Four survival analysis models are compared: Cox Proportional Hazard Regression (CoxPH) as the most commonly used statistical model, as well as Random Survival Forests (RSF) and two neural network architectures based on the DeepSurv and TabNet approaches. The models are evaluated using the concordance index (C-I), the Brier score and the AUC-ROC score. In addition, to gain more insight in the decision process of the models, we identified the features that have an higher impact on patient survival using permutation feature importance scores and SHAP values. RESULTS: Using a dataset including the cancer stage established by the Union for International Cancer Control (UICC), the best performing model is the CoxPH (C-I: 0.698±0.005), while using a dataset which includes the tumor size, lymph node and metastasis status (TNM) leads to the RSF as best performing model (C-I: 0.703±0.004). The explainability metrics show that the models rely on the combined UICC stage and the metastasis status in the first place, which corresponds to other studies. DISCUSSION: The studied methods are highly relevant for epidemiological researchers to create more accurate survival models, which can help physicians make informed decisions about appropriate therapies and management of patients with lung cancer, ultimately improving survival and quality of life.
Assuntos
Neoplasias Pulmonares , Aprendizado de Máquina , Modelos de Riscos Proporcionais , Humanos , Neoplasias Pulmonares/mortalidade , Análise de Sobrevida , Feminino , Masculino , Redes Neurais de Computação , Pessoa de Meia-Idade , Idoso , Sistema de RegistrosRESUMO
Recent advancements in spatial imaging technologies have revolutionized the acquisition of high-resolution multichannel images, gene expressions, and spatial locations at the single-cell level. Our study introduces xSiGra, an interpretable graph-based AI model, designed to elucidate interpretable features of identified spatial cell types, by harnessing multimodal features from spatial imaging technologies. By constructing a spatial cellular graph with immunohistology images and gene expression as node attributes, xSiGra employs hybrid graph transformer models to delineate spatial cell types. Additionally, xSiGra integrates a novel variant of gradient-weighted class activation mapping component to uncover interpretable features, including pivotal genes and cells for various cell types, thereby facilitating deeper biological insights from spatial data. Through rigorous benchmarking against existing methods, xSiGra demonstrates superior performance across diverse spatial imaging datasets. Application of xSiGra on a lung tumor slice unveils the importance score of cells, illustrating that cellular activity is not solely determined by itself but also impacted by neighboring cells. Moreover, leveraging the identified interpretable genes, xSiGra reveals endothelial cell subset interacting with tumor cells, indicating its heterogeneous underlying mechanisms within complex cellular interactions.
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Algoritmos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Neoplasias Pulmonares/metabolismo , Biologia Computacional/métodosRESUMO
Introduction: Radiation therapy (RT) is one of the primary treatment options for early-stage non-small cell lung cancer (ES-NSCLC). Therefore, accurately predicting the overall survival (OS) rate following radiotherapy is crucial for implementing personalized treatment strategies. This work aims to develop a dual-radiomics (DR) model to (1) predict 3-year OS in ES-NSCLC patients receiving RT using pre-treatment CT images, and (2) provide explanations between feature importanceand model prediction performance. Methods: The publicly available TCIA Lung1 dataset with 132 ES-NSCLC patients received RT were studied: 89/43 patients in the under/over 3-year OS group. For each patient, two types of radiomic features were examined: 56 handcrafted radiomic features (HRFs) extracted within gross tumor volume, and 512 image deep features (IDFs) extracted using a pre-trained U-Net encoder. They were combined as inputs to an explainable boosting machine (EBM) model for OS prediction. The EBM's mean absolute scores for HRFs and IDFs were used as feature importance explanations. To evaluate identified feature importance, the DR model was compared with EBM using either (1) key or (2) non-key feature type only. Comparison studies with other models, including supporting vector machine (SVM) and random forest (RF), were also included. The performance was evaluated by the area under the receiver operating characteristic curve (AUCROC), accuracy, sensitivity, and specificity with a 100-fold Monte Carlo cross-validation. Results: The DR model showed highestperformance in predicting 3-year OS (AUCROC=0.81 ± 0.04), and EBM scores suggested that IDFs showed significantly greater importance (normalized mean score=0.0019) than HRFs (score=0.0008). The comparison studies showed that EBM with key feature type (IDFs-only demonstrated comparable AUCROC results (0.81 ± 0.04), while EBM with non-key feature type (HRFs-only) showed limited AUCROC (0.64 ± 0.10). The results suggested that feature importance score identified by EBM is highly correlated with OS prediction performance. Both SVM and RF models were unable to explain key feature type while showing limited overall AUCROC=0.66 ± 0.07 and 0.77 ± 0.06, respectively. Accuracy, sensitivity, and specificity showed a similar trend. Discussion: In conclusion, a DR model was successfully developed to predict ES-NSCLC OS based on pre-treatment CT images. The results suggested that the feature importance from DR model is highly correlated to the model prediction power.
RESUMO
OBJECTIVES: Breast cancer is a type of cancer caused by the uncontrolled growth of cells in the breast tissue. In a few cases, erroneous diagnosis of breast cancer by specialists and unnecessary biopsies can lead to various negative consequences. In some cases, radiologic examinations or clinical findings may raise the suspicion of breast cancer, but subsequent detailed evaluations may not confirm cancer. In addition to causing unnecessary anxiety and stress to patients, such diagnosis can also lead to unnecessary biopsy procedures, which are painful, expensive, and prone to misdiagnosis. Therefore, there is a need for the development of more accurate and reliable methods for breast cancer diagnosis. METHODS: In this study, we proposed an artificial intelligence (AI)-based method for automatically classifying breast solid mass lesions as benign vs malignant. In this study, a new breast cancer dataset (Breast-XD) was created with 791 solid mass lesions belonging to 752 different patients aged 18 to 85 years, which were examined by experienced radiologists between 2017 and 2022. RESULTS: Six classifiers, support vector machine (SVM), K-nearest neighbor (K-NN), random forest (RF), decision tree (DT), logistic regression (LR), and XGBoost, were trained on the training samples of the Breast-XD dataset. Then, each classifier made predictions on 159 test data that it had not seen before. The highest classification result was obtained using the explainable XGBoost model (X2GAI) with an accuracy of 94.34%. An explainable structure is also implemented to build the reliability of the developed model. CONCLUSIONS: The results obtained by radiologists and the X2GAI model were compared according to the diagnosis obtained from the biopsy. It was observed that our developed model performed well in cases where experienced radiologists gave false positive results.
Assuntos
Inteligência Artificial , Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Adulto , Idoso , Pessoa de Meia-Idade , Idoso de 80 Anos ou mais , Reprodutibilidade dos Testes , Adulto Jovem , Adolescente , Ultrassonografia Mamária/métodos , Radiologistas/estatística & dados numéricos , Mama/diagnóstico por imagem , Sensibilidade e Especificidade , Interpretação de Imagem Assistida por Computador/métodos , Diagnóstico DiferencialRESUMO
In the field of computational oncology, patient status is often assessed using radiology-genomics, which includes two key technologies and data, such as radiology and genomics. Recent advances in deep learning have facilitated the integration of radiology-genomics data, and even new omics data, significantly improving the robustness and accuracy of clinical predictions. These factors are driving artificial intelligence (AI) closer to practical clinical applications. In particular, deep learning models are crucial in identifying new radiology-genomics biomarkers and therapeutic targets, supported by explainable AI (xAI) methods. This review focuses on recent developments in deep learning for radiology-genomics integration, highlights current challenges, and outlines some research directions for multimodal integration and biomarker discovery of radiology-genomics or radiology-omics that are urgently needed in computational oncology.
RESUMO
Deep learning architectures like ResNet and Inception have produced accurate predictions for classifying benign and malignant tumors in the healthcare domain. This enables healthcare institutions to make data-driven decisions and potentially enable early detection of malignancy by employing computer-vision-based deep learning algorithms. These CNN algorithms, in addition to requiring huge amounts of data, can identify higher- and lower-level features that are significant while classifying tumors into benign or malignant. However, the existing literature is limited in terms of the explainability of the resultant classification, and identifying the exact features that are of importance, which is essential in the decision-making process for healthcare practitioners. Thus, the motivation of this work is to implement a custom classifier on the ovarian tumor dataset, which exhibits high classification performance and subsequently interpret the classification results qualitatively, using various Explainable AI methods, to identify which pixels or regions of interest are given highest importance by the model for classification. The dataset comprises CT scanned images of ovarian tumors taken from to the axial, saggital and coronal planes. State-of-the-art architectures, including a modified ResNet50 derived from the standard pre-trained ResNet50, are implemented in the paper. When compared to the existing state-of-the-art techniques, the proposed modified ResNet50 exhibited a classification accuracy of 97.5 % on the test dataset without increasing the the complexity of the architecture. The results then were carried for interpretation using several explainable AI techniques. The results show that the shape and localized nature of the tumors play important roles for qualitatively determining the ability of the tumor to metastasize and thereafter to be classified as benign or malignant.
RESUMO
BACKGROUND: Accurately diagnosing brain tumors from MRI scans is crucial for effective treatment planning. While traditional methods heavily rely on radiologist expertise, the integration of AI, particularly Convolutional Neural Networks (CNNs), has shown promise in improving accuracy. However, the lack of transparency in AI decision-making processes presents a challenge for clinical adoption. METHODS: Recent advancements in deep learning, particularly the utilization of CNNs, have facilitated the development of models for medical image analysis. In this study, we employed the EfficientNetB0 architecture and integrated explainable AI techniques to enhance both accuracy and interpretability. Grad-CAM visualization was utilized to highlight significant areas in MRI scans influencing classification decisions. RESULTS: Our model achieved a classification accuracy of 98.72â¯% across four categories of brain tumors (Glioma, Meningioma, No Tumor, Pituitary), with precision and recall exceeding 97â¯% for all categories. The incorporation of explainable AI techniques was validated through visual inspection of Grad-CAM heatmaps, which aligned well with established diagnostic markers in MRI scans. CONCLUSION: The AI-enhanced EfficientNetB0 framework with explainable AI techniques significantly improves brain tumor classification accuracy to 98.72â¯%, offering clear visual insights into the decision-making process. This method enhances diagnostic reliability and trust, demonstrating substantial potential for clinical adoption in medical diagnostics.
Assuntos
Neoplasias Encefálicas , Aprendizado Profundo , Imageamento por Ressonância Magnética , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Meningioma/diagnóstico por imagem , Glioma/diagnóstico por imagem , Neuroimagem/métodos , Neuroimagem/normas , Interpretação de Imagem Assistida por Computador/métodos , Redes Neurais de ComputaçãoRESUMO
In DCE-MRI, the degree of contrast uptake in normal fibroglandular tissue, i.e., background parenchymal enhancement (BPE), is a crucial biomarker linked to breast cancer risk and treatment outcome. In accordance with the Breast Imaging Reporting & Data System (BI-RADS), it should be visually classified into four classes. The susceptibility of such an assessment to inter-reader variability highlights the urgent need for a standardized classification algorithm. In this retrospective study, the first post-contrast subtraction images for 27 healthy female subjects were included. The BPE was classified slice-wise by two expert radiologists. The extraction of radiomic features from segmented BPE was followed by dataset splitting and dimensionality reduction. The latent representations were then utilized as inputs to a deep neural network classifying BPE into BI-RADS classes. The network's predictions were elucidated at the radiomic feature level with Shapley values. The deep neural network achieved a BPE classification accuracy of 84 ± 2% (p-value < 0.00001). Most of the misclassifications involved adjacent classes. Different radiomic features were decisive for the prediction of each BPE class underlying the complexity of the decision boundaries. A highly precise and explainable pipeline for BPE classification was achieved without user- or algorithm-dependent radiomic feature selection.
RESUMO
This study addresses the critical challenge of detecting brain tumors using MRI images, a pivotal task in medical diagnostics that demands high accuracy and interpretability. While deep learning has shown remarkable success in medical image analysis, there remains a substantial need for models that are not only accurate but also interpretable to healthcare professionals. The existing methodologies, predominantly deep learning-based, often act as black boxes, providing little insight into their decision-making process. This research introduces an integrated approach using ResNet50, a deep learning model, combined with Gradient-weighted Class Activation Mapping (Grad-CAM) to offer a transparent and explainable framework for brain tumor detection. We employed a dataset of MRI images, enhanced through data augmentation, to train and validate our model. The results demonstrate a significant improvement in model performance, with a testing accuracy of 98.52% and precision-recall metrics exceeding 98%, showcasing the model's effectiveness in distinguishing tumor presence. The application of Grad-CAM provides insightful visual explanations, illustrating the model's focus areas in making predictions. This fusion of high accuracy and explainability holds profound implications for medical diagnostics, offering a pathway towards more reliable and interpretable brain tumor detection tools.
Assuntos
Neoplasias Encefálicas , Aprendizado Profundo , Imageamento por Ressonância Magnética , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Interpretação de Imagem Assistida por Computador/métodosRESUMO
The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.
RESUMO
Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for nonexperts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. We tested our approach on two data sets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using Auto-sklearn, surpassed standalone ML algorithms like SVM and k-Nearest Neighbors in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers. The effectiveness of Auto-sklearn is highlighted by its AUC scores of 0.97 for RCC and 0.85 for OC, obtained from the unseen test sets. Importantly, on most of the metrics considered, Auto-sklearn demonstrated a better classification performance, leveraging a mix of algorithms and ensemble techniques. Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.
Assuntos
Neoplasias Renais , Aprendizado de Máquina , Metabolômica , Neoplasias Ovarianas , Humanos , Metabolômica/métodos , Feminino , Neoplasias Ovarianas/metabolismo , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/sangue , Neoplasias Renais/metabolismo , Neoplasias Renais/diagnóstico , Neoplasias Renais/sangue , Neoplasias Renais/urina , Algoritmos , Carcinoma de Células Renais/metabolismo , Carcinoma de Células Renais/diagnóstico , Biomarcadores Tumorais/sangue , Biomarcadores Tumorais/análise , Biomarcadores Tumorais/urina , Biomarcadores Tumorais/metabolismoRESUMO
Breast cancer has rapidly increased in prevalence in recent years, making it one of the leading causes of mortality worldwide. Among all cancers, it is by far the most common. Diagnosing this illness manually requires significant time and expertise. Since detecting breast cancer is a time-consuming process, preventing its further spread can be aided by creating machine-based forecasts. Machine learning and Explainable AI are crucial in classification as they not only provide accurate predictions but also offer insights into how the model arrives at its decisions, aiding in the understanding and trustworthiness of the classification results. In this study, we evaluate and compare the classification accuracy, precision, recall, and F1 scores of five different machine learning methods using a primary dataset (500 patients from Dhaka Medical College Hospital). Five different supervised machine learning techniques, including decision tree, random forest, logistic regression, naive bayes, and XGBoost, have been used to achieve optimal results on our dataset. Additionally, this study applied SHAP analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature on the model's output. We compared the accuracy with which several algorithms classified the data, as well as contrasted with other literature in this field. After final evaluation, this study found that XGBoost achieved the best model accuracy, which is 97%.
Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Teorema de Bayes , Bangladesh/epidemiologia , Mama , Aprendizado de Máquina , HidrolasesRESUMO
In recent years, there has been an increase in the interest in adopting Explainable Artificial Intelligence (XAI) for healthcare. The proposed system includesâ¢An XAI model for cancer drug value prediction. The model provides data that is easy to understand and explain, which is critical for medical decision-making. It also produces accurate projections.â¢A model outperformed existing models due to extensive training and evaluation on a large cancer medication chemical compounds dataset.â¢Insights into the causation and correlation between the dependent and independent actors in the chemical composition of the cancer cell. While the model is evaluated on Lung Cancer data, the architecture offered in the proposed solution is cancer agnostic. It may be scaled out to other cancer cell data if the properties are similar. The work presents a viable route for customizing treatments and improving patient outcomes in oncology by combining XAI with a large dataset. This research attempts to create a framework where a user can upload a test case and receive forecasts with explanations, all in a portable PDF report.
RESUMO
Although the detection procedure has been shown to be highly effective, there are several obstacles to overcome in the usage of AI-assisted cancer cell detection in clinical settings. These issues stem mostly from the failure to identify the underlying processes. Because AI-assisted diagnosis does not offer a clear decision-making process, doctors are dubious about it. In this instance, the advent of Explainable Artificial Intelligence (XAI), which offers explanations for prediction models, solves the AI black box issue. The SHapley Additive exPlanations (SHAP) approach, which results in the interpretation of model predictions, is the main emphasis of this work. The intermediate layer in this study was a hybrid model made up of three Convolutional Neural Networks (CNNs) (InceptionV3, InceptionResNetV2, and VGG16) that combined their predictions. The KvasirV2 dataset, which comprises pathological symptoms associated to cancer, was used to train the model. Our combined model yielded an accuracy of 93.17% and an F1 score of 97%. After training the combined model, we use SHAP to analyze images from these three groups to provide an explanation of the decision that affects the model prediction.