RESUMEN
BACKGROUND: Shapley values have been used extensively in machine learning, not only to explain black box machine learning models, but among other tasks, also to conduct model debugging, sensitivity and fairness analyses and to select important features for robust modelling and for further follow-up analyses. Shapley values satisfy certain axioms that promote fairness in distributing contributions of features toward prediction or reducing error, after accounting for non-linear relationships and interactions when complex machine learning models are employed. Recently, feature selection methods using predictive Shapley values and p-values have been introduced, including powershap. METHODS: We present a novel feature selection method, LLpowershap, that takes forward these recent advances by employing loss-based Shapley values to identify informative features with minimal noise among the selected sets of features. We also enhance the calculation of p-values and power to identify informative features and to estimate number of iterations of model development and testing. RESULTS: Our simulation results show that LLpowershap not only identifies higher number of informative features but outputs fewer noise features compared to other state-of-the-art feature selection methods. Benchmarking results on four real-world datasets demonstrate higher or comparable predictive performance of LLpowershap compared to other Shapley based wrapper methods, or filter methods. LLpowershap is also ranked the best in mean ranking among the seven feature selection methods tested on the benchmark datasets. CONCLUSION: Our results demonstrate that LLpowershap is a viable wrapper feature selection method that can be used for feature selection in large biomedical datasets and other settings.
Asunto(s)
Algoritmos , Aprendizaje Automático , Humanos , Modelos Logísticos , Simulación por ComputadorRESUMEN
Agroecological systems are potential solutions to the environmental challenges of intensive agriculture. Indigenous communities, such as the Kamëntsá Biyá and Kamëntsá Inga from the Sibundoy Valley (SV) in Colombia, have their own ancient agroecological systems called chagras. However, they are threatened by population growth and expansion of intensive agriculture. Establishing new chagras or enhancing existing ones faces impediments such as the necessity for continuous monitoring and mapping of agroecological potential. However, this method is often costly and time consuming. To address this limitation, we created a digital map of the Biodiversity Management Coefficient (BMC) (as a proxy of agroecological potential) using Machine Learning. We utilized 15 environmental predictors and in-situ BMC data from 800 chagras to train an XGBoost model capable of predicting a multiclass BMC structure with 70% accuracy. This model was deployed across the study area to map the extent and spatial distribution of BMC classes, providing detailed information on potential areas for new agroecological chagras as well as areas unsuitable for this purpose. This map captured footprints of past and present disturbance events in the SV, revealing its usefulness for agroecological planning. We highlight the most significant predictors and their optimal values that trigger higher BMC status.
RESUMEN
BACKGROUND: Haemoperfusion (HP) is an innovative extracorporeal therapy that utilizes special cartridges to filter the blood, effectively removing pro-inflammatory cytokines, toxins, and pathogens in COVID-19 patients. This retrospective cohort study aimed to assess the clinical benefits of HP for severe COVID-19 cases using Shapley values for machine learning models. METHODS: The research involved 578 inpatients (≥ 20 years old) admitted to Baqiyatallah hospital (Tehran, Iran). The control group (359 patients) received standard treatment, including high doses of corticosteroids (a single 500 mg methylprednisolone pulse, followed by 250 mg for 2 days), categorized as regimen (I). On the other hand, the HP group (219 patients) received regimen II, consisting of the same corticosteroid treatment (regimen I) along with haemoperfusion using Cytosorb H300. The frequency of haemoperfusion sessions varied based on the type of lung involvement determined by chest CT scans. In addition, the value function v defines the Shapley value of the i th feature for the query point x , where the input matrix features represent individual characteristics, drugs, and history and clinical conditions of the patient. RESULTS: Our data showed a favorable clinical response in the HP group compared to the control group. Notably, one-to-three sessions of HP using the CytoSorb® 300 cartridge led to reduced ventilation requirements and mortality rates in severe COVID-19 patients. Shapley values were calculated to evaluate the contribution of haemoperfusion among other factors, such as side effects, medications, and individual characteristics, to COVID-19 patient outcomes. In addition, there is a significant difference between the two groups among the treatments and medications used remdesivir, adalimumab, tocilizumab, favipiravir, Interferon beta-1a, enoxaparin prophylaxis, enoxaparin full dose, heparin prophylaxis, and heparin full dose (P < 0.05). It seems that haemoperfusion has a positive impact on the reduction of inflammation markers and renal functional such as ferritin and creatinine, respectively, as well as D-dimer and WBC levels in the HP group were significantly lower than the control group. CONCLUSION: The findings indicated that haemoperfusion played a crucial role in predicting patient survival, making it a significant feature in classifying patients' prognoses.
Asunto(s)
COVID-19 , Hemoperfusión , Aprendizaje Automático , Humanos , Hemoperfusión/métodos , Estudios Retrospectivos , Masculino , Femenino , Persona de Mediana Edad , Irán , Adulto , Anciano , Resultado del Tratamiento , Metilprednisolona/administración & dosificación , Metilprednisolona/uso terapéutico , Tratamiento Farmacológico de COVID-19 , Corticoesteroides/uso terapéutico , Corticoesteroides/administración & dosificaciónRESUMEN
Social interactions are essential for well-being. Therefore, researchers increasingly attempt to capture an individual's social context to predict well-being, including mood. Different tools are used to measure various aspects of the social context. Digital phenotyping is a commonly used technology to assess a person's social behavior objectively. The experience sampling method (ESM) can capture the subjective perception of specific interactions. Lastly, egocentric networks are often used to measure specific relationship characteristics. These different methods capture different aspects of the social context over different time scales that are related to well-being, and combining them may be necessary to improve the prediction of well-being. Yet, they have rarely been combined in previous research. To address this gap, our study investigates the predictive accuracy of mood based on the social context. We collected intensive within-person data from multiple passive and self-report sources over a 28-day period in a student sample (Participants: N = 11, ESM measures: N = 1313). We trained individualized random forest machine learning models, using different predictors included in each model summarized over different time scales. Our findings revealed that even when combining social interactions data using different methods, predictive accuracy of mood remained low. The average coefficient of determination over all participants was 0.06 for positive and negative affect and ranged from - 0.08 to 0.3, indicating a large amount of variance across people. Furthermore, the optimal set of predictors varied across participants; however, predicting mood using all predictors generally yielded the best predictions. While combining different predictors improved predictive accuracy of mood for most participants, our study highlights the need for further work using larger and more diverse samples to enhance the clinical utility of these predictive modeling approaches.
Asunto(s)
Afecto , Evaluación Ecológica Momentánea , Aprendizaje Automático , Humanos , Femenino , Masculino , Adulto Joven , Adulto , Red Social , Interacción Social , Adolescente , Autoinforme , Medio SocialRESUMEN
BACKGROUND: Most prostate cancers(PCa) rely on serum prostate-specific antigen (PSA) testing for biopsy confirmation, but the accuracy needs to be further improved. We need to continue to develop PCa prediction model with high clinical application value. METHODS: Benign prostatic hyperplasia (BPH) and prostate cancer data were obtained from the Chinese National Clinical Medical Science Data Center for retrospective analysis. The model was constructed using the XGBoost algorithm, and patients' age, body mass index (BMI), PSA-related parameters and serum biochemical parameters were used as model variables. Using decision analysis curve (DCA) to evaluate the clinical utility of the models. The shapley additive explanation (SHAP) framework was used to analyze the importance ranking and risk threshold of the variables. RESULTS: A total of 1915 patients were included in this study, including 823 (43.0%) were BPH patients and 1092 (57.0%) were PCa patients. The XGBoost model provided better performance (AUC 0.82) compared with f/tPSA (AUC 0.75),tPSA (AUC 0.68) and fPSA (AUC 0.61), respectively. Based on SHAP values, f/tPSA was the most important variable, and the top five most important biochemical parameter variables were inorganic phosphorus (P), potassium (K), creatine kinase MB isoenzyme (CKMB), low-density lipoprotein cholesterol (LDL-C), and creatinine (Cre). PCa risk thresholds for these risk markers were f/tPSA (0.13), P (1.29 mmol/L), K (4.29 mmol/L), CKMB ( 11.6U/L), LDL-C (3.05mmol/L) and Cre (74.5-99.1umol/L). CONCLUSION: The present model has advantages of wide-spread availability and high net benefit, especially for underdeveloped countries and regions. Furthermore, these risk thresholds can assist in the diagnosis and screening of prostate cancer in clinical practice.
Asunto(s)
Hiperplasia Prostática , Neoplasias de la Próstata , Masculino , Humanos , Antígeno Prostático Específico , Hiperplasia Prostática/diagnóstico , Estudios Retrospectivos , LDL-ColesterolRESUMEN
Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important). Explainable AI has developed in the last few years as a sub-discipline of computer science and machine learning to mitigate these concerns (as well as concerns of fairness and transparency in deep modeling). In this article, our focus is on explaining which inputs are important in models for predicting environmental data. In particular, we focus on three general methods for explainability that are model agnostic and thus applicable across a breadth of models without internal explainability: "feature shuffling", "interpretable local surrogates", and "occlusion analysis". We describe particular implementations of each of these and illustrate their use with a variety of models, all applied to the problem of long-lead forecasting monthly soil moisture in the North American corn belt given sea surface temperature anomalies in the Pacific Ocean.
RESUMEN
Protein adaptations to extreme environmental conditions are drivers in biotechnological process optimization and essential to unravel the molecular limits of life. Most proteins with such desirable adaptations are found in extremophilic organisms inhabiting extreme environments. The deep sea is such an environment and a promising resource that poses multiple extremes on its inhabitants. Conditions like high hydrostatic pressure and high or low temperature are prevalent and many deep-sea organisms tolerate multiple of these extremes. While molecular adaptations to high temperature are comparatively good described, adaptations to other extremes like high pressure are not well-understood yet. To fully unravel the molecular mechanisms of individual adaptations it is probably necessary to disentangle multifactorial adaptations. In this study, we evaluate differences of protein structures from deep-sea organisms and their respective related proteins from nondeep-sea organisms. We created a data collection of 1281 experimental protein structures from 25 deep-sea organisms and paired them with orthologous proteins. We exhaustively evaluate differences between the protein pairs with machine learning and Shapley values to determine characteristic differences in sequence and structure. The results show a reasonable discrimination of deep-sea and nondeep-sea proteins from which we distinguish correlations previously attributed to thermal stability from other signals potentially describing adaptions to high pressure. While some distinct correlations can be observed the overall picture appears intricate.
Asunto(s)
Adaptación Fisiológica , Proteínas , Frío , Calor , Presión Hidrostática , Proteínas/metabolismoRESUMEN
Neuroimaging-driven brain age estimation has become popular in measuring brain aging and identifying neurodegenerations. However, the single estimated brain age (gap) compromises regional variations of brain aging, losing spatial specificity across diseases which is valuable for early screening. In this study, we combined brain age modeling with Shapley Additive Explanations to measure brain aging as a feature contribution vector underlying spatial pathological aging mechanism. Specifically, we regressed age with volumetric brain features using machine learning to construct the brain age model, and model-agnostic Shapley values were calculated to attribute regional brain aging for each subject's age estimation, forming the brain age vector. Spatial specificity of the brain age vector was evaluated among groups of normal aging, prodromal Parkinson disease (PD), stable mild cognitive impairment (sMCI), and progressive mild cognitive impairment (pMCI). Machine learning methods were adopted to examine the discriminability of the brain age vector in early disease screening, compared with the other two brain aging metrics (single brain age gap, regional brain age gaps) and brain volumes. Results showed that the proposed brain age vector accurately reflected disorder-specific abnormal aging patterns related to the medial temporal and the striatum for prodromal AD (sMCI vs. pMCI) and PD (healthy controls [HC] vs. prodromal PD), respectively, and demonstrated outstanding performance in early disease screening, with area under the curves of 83.39% and 72.28% in detecting pMCI and prodromal PD, respectively. In conclusion, the proposed brain age vector effectively improves spatial specificity of brain aging measurement and enables individual screening of neurodegenerative diseases.
Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Enfermedades Neurodegenerativas , Humanos , Enfermedad de Alzheimer/patología , Imagen por Resonancia Magnética/métodos , Encéfalo/diagnóstico por imagen , Encéfalo/patología , Disfunción Cognitiva/patología , Envejecimiento/patología , Enfermedades Neurodegenerativas/diagnóstico por imagen , Enfermedades Neurodegenerativas/patologíaRESUMEN
Blood pressure (BP) is among the most important vital signals. Estimation of absolute BP solely using photoplethysmography (PPG) has gained immense attention over the last years. Available works differ in terms of used features as well as classifiers and bear large differences in their results. This work aims to provide a machine learning method for absolute BP estimation, its interpretation using computational methods and its critical appraisal in face of the current literature. We used data from three different sources including 273 subjects and 259,986 single beats. We extracted multiple features from PPG signals and its derivatives. BP was estimated by xgboost regression. For interpretation we used Shapley additive values (SHAP). Absolute systolic BP estimation using a strict separation of subjects yielded a mean absolute error of 9.456mmHg and correlation of 0.730. The results markedly improve if data separation is changed (MAE: 6.366mmHg, r: 0.874). Interpretation by means of SHAP revealed four features from PPG, its derivation and its decomposition to be most relevant. The presented approach depicts a general way to interpret multivariate prediction algorithms and reveals certain features to be valuable for absolute BP estimation. Our work underlines the considerable impact of data selection and of training/testing separation, which must be considered in detail when algorithms are to be compared. In order to make our work traceable, we have made all methods available to the public.
Asunto(s)
Determinación de la Presión Sanguínea , Fotopletismografía , Algoritmos , Presión Sanguínea , Determinación de la Presión Sanguínea/métodos , Humanos , Fotopletismografía/métodos , Análisis de la Onda del PulsoRESUMEN
Anthropogenic alterations have resulted in widespread degradation of stream conditions. To aid in stream restoration and management, baseline estimates of conditions and improved explanation of factors driving their degradation are needed. We used random forests to model biological conditions using a benthic macroinvertebrate index of biotic integrity for small, non-tidal streams (upstream area ≤200 km2) in the Chesapeake Bay watershed (CBW) of the mid-Atlantic coast of North America. We utilized several global and local model interpretation tools to improve average and site-specific model inferences, respectively. The model was used to predict condition for 95,867 individual catchments for eight periods (2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019). Predicted conditions were classified as Poor, FairGood, or Uncertain to align with management needs and individual reach lengths and catchment areas were summed by condition class for the CBW for each period. Global permutation and local Shapley importance values indicated percent of forest, development, and agriculture in upstream catchments had strong impacts on predictions. Development and agriculture negatively influenced stream condition for model average (partial dependence [PD] and accumulated local effect [ALE] plots) and local (individual condition expectation and Shapley value plots) levels. Friedman's H-statistic indicated large overall interactions for these three land covers, and bivariate global plots (PD and ALE) supported interactions among agriculture and development. Total stream length and catchment area predicted in FairGood conditions decreased then increased over the 19-years (length/area: 66.6/65.4% in 2001, 66.3/65.2% in 2011, and 66.6/65.4% in 2019). Examination of individual catchment predictions between 2001 and 2019 showed those predicted to have the largest decreases in condition had large increases in development; whereas catchments predicted to exhibit the largest increases in condition showed moderate increases in forest cover. Use of global and local interpretative methods together with watershed-wide and individual catchment predictions support conservation practitioners that need to identify widespread and localized patterns, especially acknowledging that management actions typically take place at individual-reach scales.
Asunto(s)
Bahías , Ríos , Agricultura , Ecosistema , Monitoreo del Ambiente/métodos , Aprendizaje AutomáticoRESUMEN
BACKGROUND: Low sexual desire is the most common sexual problem reported with 34% of women and 15% of men reporting lack of desire for at least 3 months in a 12-month period. Sexual desire has previously been associated with both relationship and individual well-being highlighting the importance of understanding factors that contribute to sexual desire as improving sexual desire difficulties can help improve an individual's overall quality of life. AIM: The purpose of the present study was to identify the most salient individual (eg, attachment style, attitudes toward sexuality, gender) and relational (eg, relationship satisfaction, sexual satisfaction, romantic love) predictors of dyadic and solitary sexual desire from a large number of predictor variables. METHODS: Previous research has relied primarily on traditional statistical models which are limited in their ability to estimate a large number of predictors, non-linear associations, and complex interactions. We used a machine learning algorithm, random forest (a type of highly non-linear decision tree), to circumvent these issues to predict dyadic and solitary sexual desire from a large number of predictors across 2 online samples (N = 1,846; includes 754 individuals forming 377 couples). We also used a Shapley value technique to estimate the size and direction of the effect of each predictor variable on the model outcome. OUTCOMES: The outcomes included total, dyadic, and solitary sexual desire measured using the Sexual Desire Inventory. RESULTS: The models predicted around 40% of variance in dyadic and solitary desire with women's desire being more predictable than men's overall. Several variables consistently predicted dyadic sexual desire such as sexual satisfaction and romantic love, and solitary desire such as masturbation and attitudes toward sexuality. These predictors were similar for both men and women and gender was not an important predictor of sexual desire. CLINICAL TRANSLATION: The results highlight the importance of addressing overall relationship satisfaction when sexual desire difficulties are presented in couples therapy. It is also important to understand clients' attitudes toward sexuality. STRENGTHS & LIMITATIONS: The study improves on existing methodologies in the field and compares a large number of predictors of sexual desire. However, the data were cross-sectional and there may have been variables that are important for desire but were not present in the datasets. CONCLUSION: Higher sexual satisfaction and feelings of romantic love toward one's partner are important predictors of dyadic sexual desire whereas regular masturbation and more permissive attitudes toward sexuality predicted solitary sexual desire. Vowels LM, Vowels MJ, Mark KP. Uncovering the Most Important Factors for Predicting Sexual Desire Using Explainable Machine Learning. J Sex Med 2021;18:1198-1216.
Asunto(s)
Libido , Calidad de Vida , Estudios Transversales , Femenino , Humanos , Aprendizaje Automático , Masculino , Conducta Sexual , Parejas SexualesRESUMEN
BACKGROUND: Machine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests. However, the way in which the resulting scores or classifications are produced from the input data may not be transparent. Algorithmic explainability or interpretability has become a focus of ML research. Shapley values, first introduced in game theory, can provide explanations of the result generated from a specific set of input data by a complex ML algorithm. METHODS: For a multivariate molecular diagnostic test in clinical use (the VeriStrat® test), we calculate and discuss the interpretation of exact Shapley values. We also employ some standard approximation techniques for Shapley value computation (local interpretable model-agnostic explanation (LIME) and Shapley Additive Explanations (SHAP) based methods) and compare the results with exact Shapley values. RESULTS: Exact Shapley values calculated for data collected from a cohort of 256 patients showed that the relative importance of attributes for test classification varied by sample. While all eight features used in the VeriStrat® test contributed equally to classification for some samples, other samples showed more complex patterns of attribute importance for classification generation. Exact Shapley values and Shapley-based interaction metrics were able to provide interpretable classification explanations at the sample or patient level, while patient subgroups could be defined by comparing Shapley value profiles between patients. LIME and SHAP approximation approaches, even those seeking to include correlations between attributes, produced results that were quantitatively and, in some cases qualitatively, different from the exact Shapley values. CONCLUSIONS: Shapley values can be used to determine the relative importance of input attributes to the result generated by a multivariate molecular diagnostic test for an individual sample or patient. Patient subgroups defined by Shapley value profiles may motivate translational research. However, correlations inherent in molecular data and the typically small ML training sets available for molecular diagnostic test development may cause some approximation methods to produce approximate Shapley values that differ both qualitatively and quantitatively from exact Shapley values. Hence, caution is advised when using approximate methods to evaluate Shapley explanations of the results of molecular diagnostic tests.
Asunto(s)
Aprendizaje Automático , Patología Molecular , Algoritmos , Estudios de Cohortes , HumanosRESUMEN
In this paper we apply a series of Machine Learning models to a recently published unique dataset on the mortality of COVID19 patients. We use a dataset consisting of blood samples of 375 patients admitted to a hospital in the region of Wuhan, China. There are 201 patients who survived hospitalisation and 174 patients who died whilst in hospital. The focus of the paper is not only on seeing which Machine Learning model is able to obtain the absolute highest accuracy but more on the interpretation of what the Machine Learning models provides. We find that age, days in hospital, Lymphocyte and Neutrophils are important and robust predictors when predicting a patients mortality. Furthermore, the algorithms we use allows us to observe the marginal impact of each variable on a case-by-case patient level, which might help practicioneers to easily detect anomalous patterns. This paper analyses the global and local interpretation of the Machine Learning models on patients with COVID19.
RESUMEN
Difficulties in interpreting machine learning (ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research. There is a need for agnostic approaches aiding in the interpretation of ML models regardless of their complexity that is also applicable to deep neural network (DNN) architectures and model ensembles. To these ends, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced. The SHAP approach enables the identification and prioritization of features that determine compound classification and activity prediction using any ML model. Herein, we further extend the evaluation of the SHAP methodology by investigating a variant for exact calculation of Shapley values for decision tree methods and systematically compare this variant in compound activity and potency value predictions with the model-independent SHAP method. Moreover, new applications of the SHAP analysis approach are presented including interpretation of DNN models for the generation of multi-target activity profiles and ensemble regression models for potency prediction.
Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Redes Neurales de la Computación , Preparaciones Farmacéuticas/normas , Humanos , Modelos Moleculares , Preparaciones Farmacéuticas/metabolismo , Relación Estructura-Actividad , Equivalencia TerapéuticaRESUMEN
To improve the performance of Intensive Care Units (ICUs), the field of bio-statistics has developed scores which try to predict the likelihood of negative outcomes. These help evaluate the effectiveness of treatments and clinical practice, and also help to identify patients with unexpected outcomes. However, they have been shown by several studies to offer sub-optimal performance. Alternatively, Deep Learning offers state of the art capabilities in certain prediction tasks and research suggests deep neural networks are able to outperform traditional techniques. Nevertheless, a main impediment for the adoption of Deep Learning in healthcare is its reduced interpretability, for in this field it is crucial to gain insight into the why of predictions, to assure that models are actually learning relevant features instead of spurious correlations. To address this, we propose a deep multi-scale convolutional architecture trained on the Medical Information Mart for Intensive Care III (MIMIC-III) for mortality prediction, and the use of concepts from coalitional game theory to construct visual explanations aimed to show how important these inputs are deemed by the network. Results show our model attains a ROC AUC of 0.8735 (± 0.0025) which is competitive with the state of the art of Deep Learning mortality models trained on MIMIC-III data, while remaining interpretable. Supporting code can be found at https://github.com/williamcaicedo/ISeeU.
Asunto(s)
Cuidados Críticos/métodos , Aprendizaje Profundo , Mortalidad Hospitalaria , Unidades de Cuidados Intensivos , Informática Médica/métodos , Algoritmos , Área Bajo la Curva , Registros Electrónicos de Salud , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Curva ROC , Reproducibilidad de los Resultados , Estudios Retrospectivos , Sensibilidad y EspecificidadRESUMEN
Objectives: The addition of two-way interactions is a classic problem in statistics, and comes with the challenge of quadratically increasing dimension. We aim to a) devise an estimation method that can handle this challenge and b) to aid interpretation of the resulting model by developing computational tools for quantifying variable importance. Methods: Existing strategies typically overcome the dimensionality problem by only allowing interactions between relevant main effects. Building on this philosophy, and aiming for settings with moderate n to p ratio, we develop a local shrinkage model that links the shrinkage of interaction effects to the shrinkage of their corresponding main effects. In addition, we derive a new analytical formula for the Shapley value, which allows rapid assessment of individual-specific variable importance scores and their uncertainties. Results: We empirically demonstrate that our approach provides accurate estimates of the model parameters and very competitive predictive accuracy. In our Bayesian framework, estimation inherently comes with inference, which facilitates variable selection. Comparisons with key competitors are provided. Large-scale cohort data are used to provide realistic illustrations and evaluations. The implementation of our method in RStan is relatively straightforward and flexible, allowing for adaptation to specific needs. Conclusions: Our method is an attractive alternative for existing strategies to handle interactions in epidemiological and/or clinical studies, as its linked local shrinkage can improve parameter accuracy, prediction and variable selection. Moreover, it provides appropriate inference and interpretation, and may compete well with less interpretable machine learners in terms of prediction.
RESUMEN
Explaining the decisions made by a radiomic model is of significant interest, as it can provide valuable insights into the information learned by complex models and foster trust in well-performing ones, thereby facilitating their clinical adoption. Promising radiomic approaches that aggregate information from multiple regions within an image currently lack suitable explanation tools that could identify the regions that most significantly influence their decisions. Here we present a model- and modality-agnostic tool (RadShap, https://github.com/ncaptier/radshap), based on Shapley values, that explains the predictions of multiregion radiomic models by highlighting the contribution of each individual region. Methods: The explanation tool leverages Shapley values to distribute the aggregative radiomic model's output among all the regions of interest of an image, highlighting their individual contribution. RadShap was validated using a retrospective cohort of 130 patients with advanced non-small cell lung cancer undergoing first-line immunotherapy. Their baseline PET scans were used to build 1,000 synthetic tasks to evaluate the degree of alignment between the tool's explanations and our data generation process. RadShap's potential was then illustrated through 2 real case studies by aggregating information from all segmented tumors: the prediction of the progression-free survival of the non-small cell lung cancer patients and the classification of the histologic tumor subtype. Results: RadShap demonstrated strong alignment with the ground truth, with a median frequency of 94% for consistently explained predictions in the synthetic tasks. In both real-case studies, the aggregative models yielded superior performance to the single-lesion models (average [±SD] time-dependent area under the receiver operating characteristic curve was 0.66 ± 0.02 for the aggregative survival model vs. 0.55 ± 0.04 for the primary tumor survival model). The tool's explanations provided relevant insights into the behavior of the aggregative models, highlighting that for the classification of the histologic subtype, the aggregative model used information beyond the biopsy site to correctly classify patients who were initially misclassified by a model focusing only on the biopsied tumor. Conclusion: RadShap aligned with ground truth explanations and provided valuable insights into radiomic models' behaviors. It is implemented as a user-friendly Python package with documentation and tutorials, facilitating its smooth integration into radiomic pipelines.
Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Radiómica , Humanos , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Neoplasias Pulmonares/diagnóstico por imagen , Tomografía Computarizada por Tomografía de Emisión de Positrones , Estudios Retrospectivos , Programas InformáticosRESUMEN
Modelling multivariate spatio-temporal data with complex dependency structures is a challenging task but can be simplified by assuming that the original variables are generated from independent latent components. If these components are found, they can be modelled univariately. Blind source separation aims to recover the latent components by estimating the unknown linear or nonlinear unmixing transformation based on the observed data only. In this paper, we extend recently introduced identifiable variational autoencoder to the nonlinear nonstationary spatio-temporal blind source separation setting and demonstrate its performance using comprehensive simulation studies. Additionally, we introduce two alternative methods for the latent dimension estimation, which is a crucial task in order to obtain the correct latent representation. Finally, we illustrate the proposed methods using a meteorological application, where we estimate the latent dimension and the latent components, interpret the components, and show how nonstationarity can be accounted and prediction accuracy can be improved by using the proposed nonlinear blind source separation method as a preprocessing method.
RESUMEN
The synthesis of silver nanoparticles with controlled physicochemical properties is essential for governing their intended functionalities and safety profiles. However, synthesis process involves multiple parameters that could influence the resulting properties. This challenge could be addressed with the development of predictive models that forecast endpoints based on key synthesis parameters. In this study, we manually extracted synthesis-related data from the literature and leveraged various machine learning algorithms. Data extraction included parameters such as reactant concentrations, experimental conditions, as well as physicochemical properties. The antibacterial efficiencies and toxicological profiles of the synthesized nanoparticles were also extracted. In a second step, based on data completeness, we employed regression algorithms to establish relationships between synthesis parameters and desired endpoints and to build predictive models. The models for core size and antibacterial efficiency were trained and validated using a cross-validation approach. Finally, the features' impact was evaluated via Shapley values to provide insights into the contribution of features to the predictions. Factors such as synthesis duration, scale of synthesis and the choice of capping agents emerged as the most significant predictors. This study demonstrated the potential of machine learning to aid in the rational design of synthesis process and paves the way for the safe-by-design principles development by providing insights into the optimization of the synthesis process to achieve the desired properties. Finally, this study provides a valuable dataset compiled from literature sources with significant time and effort from multiple researchers. Access to such datasets notably aids computational advances in the field of nanotechnology.
RESUMEN
Introduction: The research in consumer neuroscience has identified computational methods, particularly artificial intelligence (AI) and machine learning, as a significant frontier for advancement. Previously, we utilized functional magnetic resonance imaging (fMRI) and artificial neural networks (ANNs) to model brain processes related to brand preferences in a paradigm exempted from motor actions. In the current study, we revisit this data, introducing recent advancements in explainable artificial intelligence (xAI) to gain insights into this domain. By integrating fMRI data analysis, machine learning, and xAI, our study aims to search for functional brain networks that support brand perception and, ultimately, search for brain networks that disentangle between preferred and indifferent brands, focusing on the early processing stages. Methods: We applied independent component analysis (ICA) to overcome the expected fMRI data's high dimensionality, which raises hurdles in AI applications. We extracted pertinent features from the returned ICs. An ANN is then trained on this data, followed by pruning and retraining processes. We then apply explanation techniques, based on path-weights and Shapley values, to make the network more transparent, explainable, and interpretable, and to obtain insights into the underlying brain processes. Results: The fully connected ANN model obtained an accuracy of 54.6%, which dropped to 50.4% after pruning. However, the retraining process allowed it to surpass the fully connected network, achieving an accuracy of 55.9%. The path-weights and Shapley-based analysis concludes that, regarding brand perception, the expected initial participation of the primary visual system is followed. Other brain areas participate in early processing and discriminate between preferred and indifferent brands, such as the cuneal and the lateral occipital cortices. Discussion: The most important finding is that a split between processing brands|preferred from brands|indifferent may occur during early processing stages, still in the visual system. However, we found no evidence of a "decision pipeline" that would yield if a brand is preferred or indifferent. The results suggest the existence of a "tagging"-like process in parallel flows in the extrastriate. Network training dynamics aggregate specific processes within the hidden nodes by analyzing the model's hidden layer. This yielded that some nodes contribute to both global brand appraisal and specific brand category classification, shedding light on the neural substrates of decision-making in response to brand stimuli.