Pesquisa | Biblioteca Virtual em Saúde

Hybrid visualization-based framework for depressive state detection and characterization of atypical patients.

Kopitar, Leon; Kokol, Peter; Stiglic, Gregor.

J Biomed Inform ; 147: 104535, 2023 11.

Artigo em Inglês | MEDLINE | ID: mdl-37926393

RESUMO

INTRODUCTION: Depression is a global concern, with a significant number of people affected worldwide, particularly in low- and middle-income countries. The rising prevalence of depression emphasizes the importance of early detection and understanding the origins of such conditions. OBJECTIVE: This paper proposes a framework for detecting depression using a hybrid visualization approach that combines local and global interpretation. This approach aims to assist in model adaptation, provide insights into patient characteristics, and evaluate prediction model suitability in a different environment. METHODS: This study utilizes R programming language with the Caret, ggplot2, Plotly, and Dalex libraries for model training, visualization, and interpretation. Data from the NHANES repository was used for secondary data analysis. The NHANES repository is a comprehensive source for examining health and nutrition of individuals in the United States, and covers demographic, dietary, medication use, lifestyle choices, reproductive and mental health data. Penalized logistic regression models were built using NHANES 2015-2018 data, while NHANES 2019-March 2020 data was used for evaluation at the global-specific and local level interpretation. RESULTS: The prediction model that supports this framework achieved an average AUC score of 0.748 (95% CI: 0.743-0.752), with minimal variability in sensitivity and specificity. CONCLUSION: The built-in prediction model highlights chest pain, the ratio of family income to poverty, and smoking status as crucial features for predicting depressive states in both the original and local environments.

Assuntos

Dieta , Pobreza , Humanos , Estados Unidos , Inquéritos Nutricionais , Modelos Logísticos

Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data.

Kopitar, Leon; Stiglic, Gregor.

Sci Rep ; 13(1): 13417, 2023 08 17.

Artigo em Inglês | MEDLINE | ID: mdl-37591974

RESUMO

Prior to further processing, completed questionnaires must be screened for the presence of careless respondents. Different people will respond to surveys in different ways. Some take the easy path and fill out the survey carelessly. The proportion of careless respondents determines the survey's quality. As a result, identifying careless respondents is critical for the quality of obtained results. This study aims to explore the characteristics of careless respondents in survey data and evaluate the predictive power and interpretability of different types of data and indices of careless responding. The research question focuses on understanding the behavior of careless respondents and determining the effectiveness of various data sources in predicting their responses. Data from a three-month web-based survey on participants' personality traits such as honesty-humility, emotionality, extraversion, agreeableness, conscientiousness and openness to experience was used in this study. Data for this study was taken from Schroeders et al.. The gradient boosting machine-based prediction model uses data from the answers, time spent for answering, demographic information on the respondents as well as some indices of careless responding from all three types of data. Prediction models were evaluated with tenfold cross-validation repeated a hundred times. Prediction models were compared based on balanced accuracy. Models' explanations were provided with Shapley values. Compared with existing work, data fusion from multiple types of information had no noticeable effect on the performance of the gradient boosting machine model. Variables such as "I would never take a bribe, even if it was a lot", average longstring, and total intra-individual response variability were found to be useful in distinguishing careless respondents. However, variables like "I would be tempted to use counterfeit money if I could get away with it" and intra-individual response variability of the first section of a survey showed limited effectiveness. Additionally, this study indicated that, whereas the psychometric synonym score has an immediate effect and is designed with the goal of identifying careless respondents when combined with other variables, it is not necessarily the optimal choice for fitting a gradient boosting machine model.

Assuntos

Brassicaceae , Humanos , Extroversão Psicológica , Psicometria , Projetos de Pesquisa

Early detection of type 2 diabetes mellitus using machine learning-based prediction models.

Kopitar, Leon; Kocbek, Primoz; Cilar, Leona; Sheikh, Aziz; Stiglic, Gregor.

Sci Rep ; 10(1): 11981, 2020 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-32686721

RESUMO

Most screening tests for T2DM in use today were developed using multivariate regression methods that are often further simplified to allow transformation into a scoring formula. The increasing volume of electronically collected data opened the opportunity to develop more complex, accurate prediction models that can be continuously updated using machine learning approaches. This study compares machine learning-based prediction models (i.e. Glmnet, RF, XGBoost, LightGBM) to commonly used regression models for prediction of undiagnosed T2DM. The performance in prediction of fasting plasma glucose level was measured using 100 bootstrap iterations in different subsets of data simulating new incoming data in 6-month batches. With 6 months of data available, simple regression model performed with the lowest average RMSE of 0.838, followed by RF (0.842), LightGBM (0.846), Glmnet (0.859) and XGBoost (0.881). When more data were added, Glmnet improved with the highest rate (+ 3.4%). The highest level of variable selection stability over time was observed with LightGBM models. Our results show no clinically relevant improvement when more sophisticated prediction models were used. Since higher stability of selected variables over time contributes to simpler interpretation of the models, interpretability and model calibration should also be considered in development of clinical prediction models.

Assuntos

Diabetes Mellitus Tipo 2/diagnóstico , Diagnóstico Precoce , Aprendizado de Máquina , Modelos Biológicos , Área Sob a Curva , Glicemia/metabolismo , Calibragem , Diabetes Mellitus Tipo 2/sangue , Jejum/sangue , Feminino , Humanos , Masculino , Pessoa de Meia-Idade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA