RESUMO
BACKGROUND: Diagnosis aims to predict the future health status of patients according to their historical electronic health records (EHR), which is an important yet challenging task in healthcare informatics. Existing diagnosis prediction approaches mainly employ recurrent neural networks (RNN) with attention mechanisms to make predictions. However, these approaches ignore the importance of code descriptions, i.e., the medical definitions of diagnosis codes. We believe that taking diagnosis code descriptions into account can help the state-of-the-art models not only to learn meaning code representations, but also to improve the predictive performance, especially when the EHR data are insufficient. METHODS: We propose a simple, but general diagnosis prediction framework, which includes two basic components: diagnosis code embedding and predictive model. To learn the interpretable code embeddings, we apply convolutional neural networks (CNN) to model medical descriptions of diagnosis codes extracted from online medical websites. The learned medical embedding matrix is used to embed the input visits into vector representations, which are fed into the predictive models. Any existing diagnosis prediction approach (referred to as the base model) can be cast into the proposed framework as the predictive model (called the enhanced model). RESULTS: We conduct experiments on two real medical datasets: the MIMIC-III dataset and the Heart Failure claim dataset. Experimental results show that the enhanced diagnosis prediction approaches significantly improve the prediction performance. Moreover, we validate the effectiveness of the proposed framework with insufficient EHR data. Finally, we visualize the learned medical code embeddings to show the interpretability of the proposed framework. CONCLUSIONS: Given the historical visit records of a patient, the proposed framework is able to predict the next visit information by incorporating medical code descriptions.
Assuntos
Codificação Clínica , Registros Eletrônicos de Saúde , Previsões , Insuficiência Cardíaca/diagnóstico , Computação em Informática Médica , Redes Neurais de Computação , Conjuntos de Dados como Assunto , Aprendizado Profundo , Insuficiência Cardíaca/classificação , Humanos , Modelos Estatísticos , PrognósticoRESUMO
Computerized adaptive testing (CAT) is a widely embraced approach for delivering personalized educational assessments, tailoring each test to the real-time performance of individual examinees. Despite its potential advantages, CAT�s application in small-scale assessments has been limited due to the complexities associated with calibrating the item bank using sparse response data and small sample sizes. This study addresses these challenges by developing a two-step item bank calibration strategy that leverages the 1-bit matrix completion method in conjunction with two distinct incomplete pretesting designs. We introduce two novel 1-bit matrix completion-based imputation methods specifically designed to tackle the issues associated with item calibration in the presence of sparse response data and limited sample sizes. To demonstrate the effectiveness of these approaches, we conduct a comparative assessment against several established item parameter estimation methods capable of handling missing data. This evaluation is carried out through two sets of simulation studies, each featuring different pretesting designs, item bank structures, and sample sizes. Furthermore, we illustrate the practical application of the methods investigated, using empirical data collected from small-scale assessments.
Assuntos
Simulação por Computador , Avaliação Educacional , Psicometria , Avaliação Educacional/estatística & dados numéricos , Avaliação Educacional/métodos , Humanos , Calibragem , Psicometria/estatística & dados numéricos , Modelos Estatísticos , Tamanho da Amostra , Interpretação Estatística de DadosRESUMO
Health risk prediction aims to forecast the potential health risks that patients may face using their historical Electronic Health Records (EHR). Although several effective models have developed, data insufficiency is a key issue undermining their effectiveness. Various data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through learning underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability. The source code is available via https://shorturl.at/aerT0.