Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
J Med Internet Res ; 25: e44030, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37140973

RESUMO

The use of artificial intelligence (AI) and big data in medicine has increased in recent years. Indeed, the use of AI in mobile health (mHealth) apps could considerably assist both individuals and health care professionals in the prevention and management of chronic diseases, in a person-centered manner. Nonetheless, there are several challenges that must be overcome to provide high-quality, usable, and effective mHealth apps. Here, we review the rationale and guidelines for the implementation of mHealth apps and the challenges regarding quality, usability, and user engagement and behavior change, with a special focus on the prevention and management of noncommunicable diseases. We suggest that a cocreation-based framework is the best method to address these challenges. Finally, we describe the current and future roles of AI in improving personalized medicine and provide recommendations for developing AI-based mHealth apps. We conclude that the implementation of AI and mHealth apps for routine clinical practice and remote health care will not be feasible until we overcome the main challenges regarding data privacy and security, quality assessment, and the reproducibility and uncertainty of AI results. Moreover, there is a lack of both standardized methods to measure the clinical outcomes of mHealth apps and techniques to encourage user engagement and behavior changes in the long term. We expect that in the near future, these obstacles will be overcome and that the ongoing European project, Watching the risk factors (WARIFA), will provide considerable advances in the implementation of AI-based mHealth apps for disease prevention and health promotion.


Assuntos
Aplicativos Móveis , Telemedicina , Humanos , Inteligência Artificial , Reprodutibilidade dos Testes , Telemedicina/métodos , Fatores de Risco
2.
Artif Intell Med ; 138: 102508, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36990585

RESUMO

Bacterial resistance to antibiotics has been rapidly increasing, resulting in low antibiotic effectiveness even treating common infections. The presence of resistant pathogens in environments such as a hospital Intensive Care Unit (ICU) exacerbates the critical admission-acquired infections. This work focuses on the prediction of antibiotic resistance in Pseudomonas aeruginosa nosocomial infections at the ICU, using Long Short-Term Memory (LSTM) artificial neural networks as the predictive method. The analyzed data were extracted from the Electronic Health Records (EHR) of patients admitted to the University Hospital of Fuenlabrada from 2004 to 2019 and were modeled as Multivariate Time Series. A data-driven dimensionality reduction method is built by adapting three feature importance techniques from the literature to the considered data and proposing an algorithm for selecting the most appropriate number of features. This is done using LSTM sequential capabilities so that the temporal aspect of features is taken into account. Furthermore, an ensemble of LSTMs is used to reduce the variance in performance. Our results indicate that the patient's admission information, the antibiotics administered during the ICU stay, and the previous antimicrobial resistance are the most important risk factors. Compared to other conventional dimensionality reduction schemes, our approach is able to improve performance while reducing the number of features for most of the experiments. In essence, the proposed framework achieve, in a computationally cost-efficient manner, promising results for supporting decisions in this clinical task, characterized by high dimensionality, data scarcity, and concept drift.


Assuntos
Antibacterianos , Infecções Bacterianas , Humanos , Antibacterianos/uso terapêutico , Farmacorresistência Bacteriana , Infecções Bacterianas/tratamento farmacológico , Redes Neurais de Computação , Unidades de Terapia Intensiva
3.
IEEE J Biomed Health Inform ; 27(6): 2670-2680, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-35930509

RESUMO

The increasing prevalence of chronic non-communicable diseases makes it a priority to develop tools for enhancing their management. On this matter, Artificial Intelligence algorithms have proven to be successful in early diagnosis, prediction and analysis in the medical field. Nonetheless, two main issues arise when dealing with medical data: lack of high-fidelity datasets and maintenance of patient's privacy. To face these problems, different techniques of synthetic data generation have emerged as a possible solution. In this work, a framework based on synthetic data generation algorithms was developed. Eight medical datasets containing tabular data were used to test this framework. Three different statistical metrics were used to analyze the preservation of synthetic data integrity and six different synthetic data generation sizes were tested. Besides, the generated synthetic datasets were used to train four different supervised Machine Learning classifiers alone, and also combined with the real data. F1-score was used to evaluate classification performance. The main goal of this work is to assess the feasibility of the use of synthetic data generation in medical data in two ways: preservation of data integrity and maintenance of classification performance.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Humanos , Algoritmos , Aprendizado de Máquina Supervisionado , Benchmarking
4.
BioData Min ; 15(1): 18, 2022 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-36064616

RESUMO

BACKGROUND: Nowadays, patients with chronic diseases such as diabetes and hypertension have reached alarming numbers worldwide. These diseases increase the risk of developing acute complications and involve a substantial economic burden and demand for health resources. The widespread adoption of Electronic Health Records (EHRs) is opening great opportunities for supporting decision-making. Nevertheless, data extracted from EHRs are complex (heterogeneous, high-dimensional and usually noisy), hampering the knowledge extraction with conventional approaches. METHODS: We propose the use of the Denoising Autoencoder (DAE), a Machine Learning (ML) technique allowing to transform high-dimensional data into latent representations (LRs), thus addressing the main challenges with clinical data. We explore in this work how the combination of LRs with a visualization method can be used to map the patient data in a two-dimensional space, gaining knowledge about the distribution of patients with different chronic conditions. Furthermore, this representation can be also used to characterize the patient's health status evolution, which is of paramount importance in the clinical setting. RESULTS: To obtain clinical LRs, we considered real-world data extracted from EHRs linked to the University Hospital of Fuenlabrada in Spain. Experimental results showed the great potential of DAEs to identify patients with clinical patterns linked to hypertension, diabetes and multimorbidity. The procedure allowed us to find patients with the same main chronic disease but different clinical characteristics. Thus, we identified two kinds of diabetic patients with differences in their drug therapy (insulin and non-insulin dependant), and also a group of women affected by hypertension and gestational diabetes. We also present a proof of concept for mapping the health status evolution of synthetic patients when considering the most significant diagnoses and drugs associated with chronic patients. CONCLUSION: Our results highlighted the value of ML techniques to extract clinical knowledge, supporting the identification of patients with certain chronic conditions. Furthermore, the patient's health status progression on the two-dimensional space might be used as a tool for clinicians aiming to characterize health conditions and identify their more relevant clinical codes.

5.
Artif Intell Med ; 122: 102211, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34823836

RESUMO

Electronic health records (EHRs) are a valuable data source that, in conjunction with deep learning (DL) methods, have provided important outcomes in different domains, contributing to supporting decision-making. Owing to the remarkable advancements achieved by DL-based models, autoencoders (AE) are becoming extensively used in health care. Nevertheless, AE-based models are based on nonlinear transformations, resulting in black-box models leading to a lack of interpretability, which is vital in the clinical setting. To obtain insights from AE latent representations, we propose a methodology by combining probabilistic models based on Gaussian mixture models and hierarchical clustering supported by Kullback-Leibler divergence. To validate the methodology from a clinical viewpoint, we used real-world data extracted from EHRs of the University Hospital of Fuenlabrada (Spain). Records were associated with healthy and chronic hypertensive and diabetic patients. Experimental outcomes showed that our approach can find groups of patients with similar health conditions by identifying patterns associated with diagnosis and drug codes. This work opens up promising opportunities for interpreting representations obtained by the AE-based model, bringing some light to the decision-making process made by clinical experts in daily practice.


Assuntos
Registros Eletrônicos de Saúde , Modelos Estatísticos , Análise por Conglomerados , Humanos , Distribuição Normal
6.
IEEE J Biomed Health Inform ; 25(12): 4340-4353, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34591775

RESUMO

The COVID-19 pandemic presents unprecedented challenges to the healthcare systems around the world. In 2020, Spain was among the countries with the highest Intensive Care Unit (ICU) hospitalization and mortality rates. This work analyzes data of COVID-19 patients admitted to a Spanish ICU during the first wave of the pandemic. The patients in our study either died (deceased patients) or were discharged from the ICU (non-deceased patients) and underwent the following landmarks: beginning of symptoms; arrival at the emergency department; beginning of the hospital stay; and ICU admission. Our goal is to create a graph-based data-science methodology to find associations among patients' comorbidities, previous medication, symptoms, and the COVID-19 treatment, and to analyze their evolution across landmarks. Towards that end, we first perform a hypothesis test based on bootstrap to identify discriminative features among deceased and non-deceased patients. Then, we leverage graph-based representations and network analytics to determine pairwise associations and complex relations among clinical features. The descriptive statistical analysis confirms that deceased patients exhibit multiple comorbidities with stronger levels of association and are treated with a wider range of drugs during the ICU stay. We also observe that the most common treatment was the simultaneous administration of lopinavir/ritonavir with hydroxychloroquine, regardless of the patients' outcome. Our results illustrate how graph tools and representations yield insights on the relations among comorbidities, drug treatments, and patients' evolution. All in all, the approach puts forth a new data-analysis tool for clinicians that can be applied to analyze (post-COVID) symptom/patient evolution.


Assuntos
Tratamento Farmacológico da COVID-19 , Mortalidade Hospitalar , Hospitalização , Hospitais , Humanos , Unidades de Terapia Intensiva , Pandemias , SARS-CoV-2
7.
Inform Health Soc Care ; 46(4): 355-369, 2021 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-33792475

RESUMO

Objective: Given the association between vitamin D deficiency and risk for cardiovascular disease, we used machine learning approaches to establish a model to predict the probability of deficiency. Determination of serum levels of 25-hydroxy vitamin D (25(OH)D) provided the best assessment of vitamin D status, but such tests are not always widely available or feasible. Thus, our study established predictive models with high sensitivity to identify patients either unlikely to have vitamin D deficiency or who should undergo 25(OH)D testing.Methods: We collected data from 1002 hypertensive patients from a Spanish university hospital. The elastic net regularization approach was applied to reduce the dimensionality of the dataset. The issue of determining vitamin D status was addressed as a classification problem; thus, the following classifiers were applied: logistic regression, support vector machine (SVM), random forest, naive Bayes, and Extreme Gradient Boost methods. Classification accuracy, sensitivity, specificity, and predictive values were computed to assess the performance of each method.Results: The SVM-based method with radial kernel performed better than the other algorithms in terms of sensitivity (98%), negative predictive value (71%), and classification accuracy (73%).Conclusion: The combination of a feature-selection method such as elastic net regularization and a classification approach produced well-fitted models. The SVM approach yielded better predictions than the other algorithms. This combination approach allowed us to develop a predictive model with high sensitivity but low specificity, to identify the population that could benefit from laboratory determination of serum levels of 25(OH)D.


Assuntos
Aprendizado de Máquina , Deficiência de Vitamina D , Algoritmos , Teorema de Bayes , Humanos , Modelos Logísticos , Máquina de Vetores de Suporte , Deficiência de Vitamina D/diagnóstico , Deficiência de Vitamina D/epidemiologia
8.
Antibiotics (Basel) ; 10(3)2021 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-33673564

RESUMO

Multi-drug resistance (MDR) is one of the most current and greatest threats to the global health system nowadays. This situation is especially relevant in Intensive Care Units (ICUs), where the critical health status of these patients makes them more vulnerable. Since MDR confirmation by the microbiology laboratory usually takes 48 h, we propose several artificial intelligence approaches to get insights of MDR risk factors during the first 48 h from the ICU admission. We considered clinical and demographic features, mechanical ventilation and the antibiotics taken by the patients during this time interval. Three feature selection strategies were applied to identify statistically significant differences between MDR and non-MDR patient episodes, ending up in 24 selected features. Among them, SAPS III and Apache II scores, the age and the department of origin were identified. Considering these features, we analyzed the potential of machine learning methods for predicting whether a patient will develop a MDR germ during the first 48 h from the ICU admission. Though the results presented here are just a first incursion into this problem, artificial intelligence approaches have a great impact in this scenario, especially when enriching the set of features from the electronic health records.

9.
BMC Bioinformatics ; 21(Suppl 2): 92, 2020 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-32164533

RESUMO

BACKGROUND: Chronic diseases are becoming more widespread each year in developed countries, mainly due to increasing life expectancy. Among them, diabetes mellitus (DM) and essential hypertension (EH) are two of the most prevalent ones. Furthermore, they can be the onset of other chronic conditions such as kidney or obstructive pulmonary diseases. The need to comprehend the factors related to such complex diseases motivates the development of interpretative and visual analysis methods, such as classification trees, which not only provide predictive models for diagnosing patients, but can also help to discover new clinical insights. RESULTS: In this paper, we analyzed healthy and chronic (diabetic, hypertensive) patients associated with the University Hospital of Fuenlabrada in Spain. Each patient was classified into a single health status according to clinical risk groups (CRGs). The CRGs characterize a patient through features such as age, gender, diagnosis codes, and drug codes. Based on these features and the CRGs, we have designed classification trees to determine the most discriminative decision features among different health statuses. In particular, we propose to make use of statistical data visualizations to guide the selection of features in each node when constructing a tree. We created several classification trees to distinguish among patients with different health statuses. We analyzed their performance in terms of classification accuracy, and drew clinical conclusions regarding the decision features considered in each tree. As expected, healthy patients and patients with a single chronic condition were better classified than patients with comorbidities. The constructed classification trees also show that the use of antipsychotics and the diagnosis of chronic airway obstruction are relevant for classifying patients with more than one chronic condition, in conjunction with the usual DM and/or EH diagnoses. CONCLUSIONS: We propose a methodology for constructing classification trees in a visually guided manner. The approach allows clinicians to progressively select the decision features at each of the tree nodes. The process is guided by exploratory data analysis visualizations, which may provide new insights and unexpected clinical information.


Assuntos
Árvores de Decisões , Diabetes Mellitus/classificação , Hipertensão/classificação , Doença Crônica , Bases de Dados Factuais , Diabetes Mellitus/diagnóstico , Nível de Saúde , Humanos , Hipertensão/diagnóstico
10.
Med Biol Eng Comput ; 58(5): 991-1002, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-32100174

RESUMO

Prediabetes is a type of hyperglycemia in which patients have blood glucose levels above normal but below the threshold for type 2 diabetes mellitus (T2DM). Prediabetic patients are considered to be at high risk for developing T2DM, but not all will eventually do so. Because it is difficult to identify which patients have an increased risk of developing T2DM, we developed a model of several clinical and laboratory features to predict the development of T2DM within a 2-year period. We used a supervised machine learning algorithm to identify at-risk patients from among 1647 obese, hypertensive patients. The study period began in 2005 and ended in 2018. We constrained data up to 2 years before the development of T2DM. Then, using a time series analysis with the features of every patient, we calculated one linear regression line and one slope per feature. Features were then included in a K-nearest neighbors classification model. Feature importance was assessed using the random forest algorithm. The K-nearest neighbors model accurately classified patients in 96% of cases, with a sensitivity of 99%, specificity of 78%, positive predictive value of 96%, and negative predictive value of 94%. The random forest algorithm selected the homeostatic model assessment-estimated insulin resistance, insulin levels, and body mass index as the most important factors, which in combination with KNN had an accuracy of 99% with a sensitivity of 99% and specificity of 97%. We built a prognostic model that accurately identified obese, hypertensive patients at risk for developing T2DM within a 2-year period. Clinicians may use machine learning approaches to better assess risk for T2DM and better manage hypertensive patients. Machine learning algorithms may help health care providers make more informed decisions.


Assuntos
Diabetes Mellitus Tipo 2 , Hipertensão , Modelos Estatísticos , Obesidade , Adulto , Idoso , Algoritmos , Diabetes Mellitus Tipo 2/complicações , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/epidemiologia , Feminino , Humanos , Hipertensão/complicações , Hipertensão/epidemiologia , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Obesidade/complicações , Obesidade/epidemiologia , Sensibilidade e Especificidade
11.
Metab Syndr Relat Disord ; 18(2): 79-85, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31928513

RESUMO

Aim: The primary objective of our research was to compare the performance of data analysis to predict vitamin D deficiency using three different regression approaches and to evaluate the usefulness of incorporating machine learning algorithms into the data analysis in a clinical setting. Methods: We included 221 patients from our hypertension unit, whose data were collected from electronic records dated between 2006 and 2017. We used classical stepwise logistic regression, and two machine learning methods [least absolute shrinkage and selection operator (LASSO) and elastic net]. We assessed the performance of these three algorithms in terms of sensitivity, specificity, misclassification error, and area under the curve (AUC). Results: LASSO and elastic net regression performed better than logistic regression in terms of AUC, which was significantly better in both penalized methods, with AUC = 0.76 and AUC = 0.74 for elastic net and LASSO, respectively, than in logistic regression, with AUC = 0.64. In terms of misclassification rate, elastic net (18%) outperformed LASSO (22%) and logistic regression (25%). Conclusion: Compared with a classical logistic regression approach, penalized methods were found to have better performance in predicting vitamin D deficiency. The use of machine learning algorithms such as LASSO and elastic net may significantly improve the prediction of vitamin D deficiency in a hypertensive obese population.


Assuntos
Mineração de Dados , Hipertensão/diagnóstico , Obesidade/diagnóstico , Deficiência de Vitamina D/diagnóstico , Biomarcadores/sangue , Estudos Transversais , Registros Eletrônicos de Saúde , Feminino , Humanos , Hipertensão/sangue , Hipertensão/epidemiologia , Modelos Logísticos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Obesidade/sangue , Obesidade/epidemiologia , Prevalência , Estudos Retrospectivos , Medição de Risco , Fatores de Risco , Espanha/epidemiologia , Deficiência de Vitamina D/sangue , Deficiência de Vitamina D/epidemiologia
12.
Med Biol Eng Comput ; 57(9): 2011-2026, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31346948

RESUMO

Appropriate management of hypertensive patients relies on the accurate identification of clinically relevant features. However, traditional statistical methods may ignore important information in datasets or overlook possible interactions among features. Machine learning may improve the prediction accuracy and interpretability of regression models by identifying the most relevant features in hypertensive patients. We sought the most relevant features for prediction of cardiovascular (CV) events in a hypertensive population. We used the penalized regression models least absolute shrinkage and selection operator (LASSO) and elastic net (EN) to obtain the most parsimonious and accurate models. The clinical parameters and laboratory biomarkers were collected from the clinical records of 1,471 patients receiving care at Mostoles University Hospital. The outcome was the development of major adverse CV events. Cox proportional hazards regression was performed alone and with penalized regression analyses (LASSO and EN), producing three models. The modeling was performed using 10-fold cross-validation to fit the penalized models. The three predictive models were compared and statistically analyzed to assess their classification accuracy, sensitivity, specificity, discriminative power, and calibration accuracy. The standard Cox model identified five relevant features, while LASSO and EN identified only three (age, LDL cholesterol, and kidney function). The accuracies of the models (prediction vs. observation) were 0.767 (Cox model), 0.754 (LASSO), and 0.764 (EN), and the areas under the curve were 0.694, 0.670, and 0.673, respectively. However, pairwise comparison of performance yielded no statistically significant differences. All three calibration curves showed close agreement between the predicted and observed probabilities of the development of a CV event. Although the performance was similar for all three models, both penalized regression analyses produced models with good fit and fewer features than the Cox regression predictive model but with the same accuracy. This case study of predictive models using penalized regression analyses shows that penalized regularization techniques can provide predictive models for CV risk assessment that are parsimonious, highly interpretable, and generalizable and that have good fit. For clinicians, a parsimonious model can be useful where available data are limited, as such a model can offer a simple but efficient way to model the impact of the different features on the prediction of CV events. Management of these features may lower the risk for a CV event. Graphical Abstract In a clinical setting, with numerous biological and laboratory features and incomplete datasets, traditional statistical methods may ignore important information and overlook possible interactions among features. Our aim was to identify the most relevant features to predict cardiovascular events in a hypertensive population, using three different regression approaches for feature selection, to improve the prediction accuracy and interpretability of regression models by identifying the relevant features in these patients.


Assuntos
Doenças Cardiovasculares/etiologia , Hipertensão/complicações , Modelos Cardiovasculares , Adulto , Fatores Etários , Idoso , LDL-Colesterol/sangue , Bases de Dados Factuais , Feminino , Humanos , Hipertensão/fisiopatologia , Testes de Função Renal , Masculino , Pessoa de Meia-Idade , Modelos de Riscos Proporcionais , Curva ROC , Análise de Regressão , Reprodutibilidade dos Testes , Medição de Risco
13.
Comput Math Methods Med ; 2019: 2059851, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30915154

RESUMO

This study describes a novel approach to solve the surgical site infection (SSI) classification problem. Feature engineering has traditionally been one of the most important steps in solving complex classification problems, especially in cases with temporal data. The described novel approach is based on abstraction of temporal data recorded in three temporal windows. Maximum likelihood L1-norm (lasso) regularization was used in penalized logistic regression to predict the onset of surgical site infection occurrence based on available patient blood testing results up to the day of surgery. Prior knowledge of predictors (blood tests) was integrated in the modelling by introduction of penalty factors depending on blood test prices and an early stopping parameter limiting the maximum number of selected features used in predictive modelling. Finally, solutions resulting in higher interpretability and cost-effectiveness were demonstrated. Using repeated holdout cross-validation, the baseline C-reactive protein (CRP) classifier achieved a mean AUC of 0.801, whereas our best full lasso model achieved a mean AUC of 0.956. Best model testing results were achieved for full lasso model with maximum number of features limited at 20 features with an AUC of 0.967. Presented models showed the potential to not only support domain experts in their decision making but could also prove invaluable for improvement in prediction of SSI occurrence, which may even help setting new guidelines in the field of preoperative SSI prevention and surveillance.


Assuntos
Proteína C-Reativa/análise , Análise Custo-Benefício , Informática Médica/métodos , Infecção da Ferida Cirúrgica/diagnóstico , Infecção da Ferida Cirúrgica/economia , Algoritmos , Área Sob a Curva , Interpretação Estatística de Dados , Árvores de Decisões , Feminino , Trato Gastrointestinal/cirurgia , Humanos , Funções Verossimilhança , Modelos Logísticos , Masculino , Noruega , Período Pré-Operatório , Análise de Regressão , Reprodutibilidade dos Testes , Fatores de Risco , Fatores de Tempo
14.
Entropy (Basel) ; 21(6)2019 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-33267317

RESUMO

The presence of bacteria with resistance to specific antibiotics is one of the greatest threats to the global health system. According to the World Health Organization, antimicrobial resistance has already reached alarming levels in many parts of the world, involving a social and economic burden for the patient, for the system, and for society in general. Because of the critical health status of patients in the intensive care unit (ICU), time is critical to identify bacteria and their resistance to antibiotics. Since common antibiotics resistance tests require between 24 and 48 h after the culture is collected, we propose to apply machine learning (ML) techniques to determine whether a bacterium will be resistant to different families of antimicrobials. For this purpose, clinical and demographic features from the patient, as well as data from cultures and antibiograms are considered. From a population point of view, we also show graphically the relationship between different bacteria and families of antimicrobials by performing correspondence analysis. Results of the ML techniques evidence non-linear relationships helping to identify antimicrobial resistance at the ICU, with performance dependent on the family of antimicrobials. A change in the trend of antimicrobial resistance is also evidenced.

15.
Entropy (Basel) ; 21(4)2019 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-33267133

RESUMO

Customer Relationship Management (CRM) is a fundamental tool in the hospitality industry nowadays, which can be seen as a big-data scenario due to the large amount of recordings which are annually handled by managers. Data quality is crucial for the success of these systems, and one of the main issues to be solved by businesses in general and by hospitality businesses in particular in this setting is the identification of duplicated customers, which has not received much attention in recent literature, probably and partly because it is not an easy-to-state problem in statistical terms. In the present work, we address the problem statement of duplicated customer identification as a large-scale data analysis, and we propose and benchmark a general-purpose solution for it. Our system consists of four basic elements: (a) A generic feature representation for the customer fields in a simple table-shape database; (b) An efficient distance for comparison among feature values, in terms of the Wagner-Fischer algorithm to calculate the Levenshtein distance; (c) A big-data implementation using basic map-reduce techniques to readily support the comparison of strategies; (d) An X-from-M criterion to identify those possible neighbors to a duplicated-customer candidate. We analyze the mass density function of the distances in the CRM text-based fields and characterized their behavior and consistency in terms of the entropy and of the mutual information for these fields. Our experiments in a large CRM from a multinational hospitality chain show that the distance distributions are statistically consistent for each feature, and that neighbourhood thresholds are automatically adjusted by the system at a first step and they can be subsequently more-finely tuned according to the manager experience. The entropy distributions for the different variables, as well as the mutual information between pairs, are characterized by multimodal profiles, where a wide gap between close and far fields is often present. This motivates the proposal of the so-called X-from-M strategy, which is shown to be computationally affordable, and can provide the expert with a reduced number of duplicated candidates to supervise, with low X values being enough to warrant the sensitivity required at the automatic detection stage. The proposed system again encourages and supports the benefits of big-data technologies in CRM scenarios for hotel chains, and rather than the use of ad-hoc heuristic rules, it promotes the research and development of theoretically principled approaches.

16.
Diabetes Metab Syndr ; 12(5): 625-629, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29661604

RESUMO

BACKGROUND: The aim of our study was to determine whether prediabetes increases cardiovascular (CV) risk compared to the non-prediabetic patients in our hypertensive population. Once this was achieved, the objective was to identify relevant CV prognostic features among prediabetic individuals. METHODS: We included hypertensive 1652 patients. The primary outcome was a composite of incident CV events: cardiovascular death, stroke, heart failure and myocardial infarction. We performed a Cox proportional hazard regression to assess the CV risk of prediabetic patients compared to non-prediabetic and to produce a survival model in the prediabetic cohort. RESULTS: The risk of developing a CV event was higher in the prediabetic cohort than in the non-prediabetic cohort, with a hazard ratio (HR) = 1.61, 95% CI 1.01-2.54, p = 0.04. Our Cox proportional hazard model selected age (HR = 1.04, 95% CI 1.02-1.07, p < 0.001) and cystatin C (HR = 2.4, 95% CI 1.26-4.22, p = 0.01) as the most relevant prognostic features in our prediabetic patients. CONCLUSIONS: Prediabetes was associated with an increased risk of CV events, when compared with the non-prediabetic patients. Age and cystatin C were found as significant risk factors for CV events in the prediabetic cohort.


Assuntos
Doenças Cardiovasculares/sangue , Cistatina C/sangue , Hipertensão/sangue , Estado Pré-Diabético/sangue , Adulto , Fatores Etários , Idoso , Biomarcadores/sangue , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Estudos de Coortes , Registros Eletrônicos de Saúde , Feminino , Seguimentos , Humanos , Hipertensão/diagnóstico , Hipertensão/epidemiologia , Masculino , Pessoa de Meia-Idade , Vigilância da População/métodos , Estado Pré-Diabético/diagnóstico , Estado Pré-Diabético/epidemiologia , Medição de Risco/métodos
17.
Artigo em Inglês | MEDLINE | ID: mdl-29494497

RESUMO

Many indices have been proposed for cardiovascular risk stratification from electrocardiogram signal processing, still with limited use in clinical practice. We created a system integrating the clinical definition of cardiac risk subdomains from ECGs and the use of diverse signal processing techniques. Three subdomains were defined from the joint analysis of the technical and clinical viewpoints. One subdomain was devoted to demographic and clinical data. The other two subdomains were intended to obtain widely defined risk indices from ECG monitoring: a simple-domain (heart rate turbulence (HRT)), and a complex-domain (heart rate variability (HRV)). Data provided by the three subdomains allowed for the generation of alerts with different intensity and nature, as well as for the grouping and scrutinization of patients according to the established processing and risk-thresholding criteria. The implemented system was tested by connecting data from real-world in-hospital electronic health records and ECG monitoring by considering standards for syntactic (HL7 messages) and semantic interoperability (archetypes based on CEN/ISO EN13606 and SNOMED-CT). The system was able to provide risk indices and to generate alerts in the health records to support decision-making. Overall, the system allows for the agile interaction of research and clinical practice in the Holter-ECG-based cardiac risk domain.


Assuntos
Doenças Cardiovasculares/diagnóstico , Sistemas de Apoio a Decisões Clínicas , Eletrocardiografia , Registros Eletrônicos de Saúde , Frequência Cardíaca/fisiologia , Idoso , Doenças Cardiovasculares/fisiopatologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Medição de Risco
18.
Comput Methods Programs Biomed ; 152: 105-114, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29054250

RESUMO

OBJECTIVES: Postoperative delirium is a common complication after major surgery among the elderly. Despite its potentially serious consequences, the complication often goes undetected and undiagnosed. In order to provide diagnosis support one could potentially exploit the information hidden in free text documents from electronic health records using data-driven clinical decision support tools. However, these tools depend on labeled training data and can be both time consuming and expensive to create. METHODS: The recent learning with anchors framework resolves this problem by transforming key observations (anchors) into labels. This is a promising framework, but it is heavily reliant on clinicians knowledge for specifying good anchor choices in order to perform well. In this paper we propose a novel method for specifying anchors from free text documents, following an exploratory data analysis approach based on clustering and data visualization techniques. We investigate the use of the new framework as a way to detect postoperative delirium. RESULTS: By applying the proposed method to medical data gathered from a Norwegian university hospital, we increase the area under the precision-recall curve from 0.51 to 0.96 compared to baselines. CONCLUSIONS: The proposed approach can be used as a framework for clinical decision support for postoperative delirium.


Assuntos
Delírio/diagnóstico , Registros Eletrônicos de Saúde , Complicações Pós-Operatórias , Idoso , Sistemas de Apoio a Decisões Clínicas , Delírio/complicações , Humanos , Noruega
19.
Sci Rep ; 7: 46226, 2017 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-28387314

RESUMO

With an aging patient population and increasing complexity in patient disease trajectories, physicians are often met with complex patient histories from which clinical decisions must be made. Due to the increasing rate of adverse events and hospitals facing financial penalties for readmission, there has never been a greater need to enforce evidence-led medical decision-making using available health care data. In the present work, we studied a cohort of 7,741 patients, of whom 4,080 were diagnosed with cancer, surgically treated at a University Hospital in the years 2004-2012. We have developed a methodology that allows disease trajectories of the cancer patients to be estimated from free text in electronic health records (EHRs). By using these disease trajectories, we predict 80% of patient events ahead in time. By control of confounders from 8326 quantified events, we identified 557 events that constitute high subsequent risks (risk > 20%), including six events for cancer and seven events for metastasis. We believe that the presented methodology and findings could be used to improve clinical decision support and personalize trajectories, thereby decreasing adverse events and optimizing cancer treatment.


Assuntos
Registros Eletrônicos de Saúde , Neoplasias/epidemiologia , Fatores de Confusão Epidemiológicos , Sistemas de Apoio a Decisões Clínicas , Progressão da Doença , Nível de Saúde , Humanos , Morbidade , Neoplasias/diagnóstico , Noruega
20.
J Biomed Inform ; 61: 87-96, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26980235

RESUMO

OBJECTIVE: In this work, we have developed a learning system capable of exploiting information conveyed by longitudinal Electronic Health Records (EHRs) for the prediction of a common postoperative complication, Anastomosis Leakage (AL), in a data-driven way and by fusing temporal population data from different and heterogeneous sources in the EHRs. MATERIAL AND METHODS: We used linear and non-linear kernel methods individually for each data source, and leveraging the powerful multiple kernels for their effective combination. To validate the system, we used data from the EHR of the gastrointestinal department at a university hospital. RESULTS: We first investigated the early prediction performance from each data source separately, by computing Area Under the Curve values for processed free text (0.83), blood tests (0.74), and vital signs (0.65), respectively. When exploiting the heterogeneous data sources combined using the composite kernel framework, the prediction capabilities increased considerably (0.92). Finally, posterior probabilities were evaluated for risk assessment of patients as an aid for clinicians to raise alertness at an early stage, in order to act promptly for avoiding AL complications. DISCUSSION: Machine-learning statistical model from EHR data can be useful to predict surgical complications. The combination of EHR extracted free text, blood samples values, and patient vital signs, improves the model performance. These results can be used as a framework for preoperative clinical decision support.


Assuntos
Procedimentos Cirúrgicos do Sistema Digestório , Registros Eletrônicos de Saúde , Complicações Pós-Operatórias , Fístula Anastomótica , Colo/cirurgia , Humanos , Modelos Estatísticos , Reto/cirurgia , Medição de Risco , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA