RESUMEN
The adoption of electronic health records in hospitals has ensured the availability of large datasets that can be used to predict medical complications. The trajectories of patients in real-world settings are highly variable, making longitudinal data modeling challenging. In recent years, significant progress has been made in the study of deep learning models applied to time series; however, the application of these models to irregular medical time series (IMTS) remains limited. To address this issue, we developed a generic deep-learning-based framework for modeling IMTS that facilitates the comparative studies of sequential neural networks (transformers and long short-term memory) and irregular time representation techniques. A validation study to predict retinopathy complications was conducted on 1207 patients with type 1 diabetes in a French database using their historical glycosylated hemoglobin measurements, without any data aggregation or imputation. The transformer-based model combined with the soft one-hot representation of time gaps achieved the highest score: an area under the receiver operating characteristic curve of 88.65%, specificity of 85.56%, sensitivity of 83.33% and an improvement of 11.7% over the same architecture without time information. This is the first attempt to predict retinopathy complications in patients with type 1 diabetes using deep learning and longitudinal data collected from patient visits. This study highlighted the significance of modeling time gaps between medical records to improve prediction performance and the utility of a generic framework for conducting extensive comparative studies.
Asunto(s)
Aprendizaje Profundo , Diabetes Mellitus Tipo 1 , Enfermedades de la Retina , Humanos , Diabetes Mellitus Tipo 1/complicaciones , Diabetes Mellitus Tipo 1/diagnóstico , Aprendizaje Automático , Redes Neurales de la ComputaciónRESUMEN
OBJECTIVE: The objective of this article was to compare the performances of health care-associated infection (HAI) detection between deep learning and conventional machine learning (ML) methods in French medical reports. METHODS: The corpus consisted in different types of medical reports (discharge summaries, surgery reports, consultation reports, etc.). A total of 1,531 medical text documents were extracted and deidentified in three French university hospitals. Each of them was labeled as presence (1) or absence (0) of HAI. We started by normalizing the records using a list of preprocessing techniques. We calculated an overall performance metric, the F1 Score, to compare a deep learning method (convolutional neural network [CNN]) with the most popular conventional ML models (Bernoulli and multi-naïve Bayes, k-nearest neighbors, logistic regression, random forests, extra-trees, gradient boosting, support vector machines). We applied the hyperparameter Bayesian optimization for each model based on its HAI identification performances. We included the set of text representation as an additional hyperparameter for each model, using four different text representations (bag of words, term frequency-inverse document frequency, word2vec, and Glove). RESULTS: CNN outperforms all other conventional ML algorithms for HAI classification. The best F1 Score of 97.7% ± 3.6% and best area under the curve score of 99.8% ± 0.41% were achieved when CNN was directly applied to the processed clinical notes without a pretrained word2vec embedding. Through receiver operating characteristic curve analysis, we could achieve a good balance between false notifications (with a specificity equal to 0.937) and system detection capability (with a sensitivity equal to 0.962) using the Youden's index reference. CONCLUSIONS: The main drawback of CNNs is their opacity. To address this issue, we investigated CNN inner layers' activation values to visualize the most meaningful phrases in a document. This method could be used to build a phrase-based medical assistant algorithm to help the infection control practitioner to select relevant medical records. Our study demonstrated that deep learning approach outperforms other classification learning algorithms for automatically identifying HAIs in medical reports.