Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records.

Nguyen, Binh P; Pham, Hung N; Tran, Hop; Nghiem, Nhung; Nguyen, Quang H; Do, Trang T T; Tran, Cao Truong; Simpson, Colin R

Nguyen, Binh P; Pham, Hung N; Tran, Hop; Nghiem, Nhung; Nguyen, Quang H; Do, Trang T T; Tran, Cao Truong; Simpson, Colin R.

Affiliation

Nguyen BP; School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand. Electronic address: b.nguyen@vuw.ac.nz.
Pham HN; School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam.
Tran H; School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand.
Nghiem N; Department of Public Health, University of Otago, 23A Mein Street, Wellington 6021, New Zealand.
Nguyen QH; School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam.
Do TTT; Institute for Infocomm Research, Agency for Science, Technology and Research, 1 Fusionopolis Way, Singapore 138632, Singapore.
Tran CT; Faculty of Information Technology, Le Quy Don Technical University, 236 Hoang Quoc Viet Street, Hanoi 100000, Vietnam.
Simpson CR; Faculty of Health, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand; Usher Institute, The University of Edinburgh, Edinburgh, EH89AG, United Kingdom.

Comput Methods Programs Biomed ; 182: 105055, 2019 Dec.

Article in En | MEDLINE | ID: mdl-31505379

ABSTRACT

OBJECTIVE: Diabetes is responsible for considerable morbidity, healthcare utilisation and mortality in both developed and developing countries. Currently, methods of treating diabetes are inadequate and costly so prevention becomes an important step in reducing the burden of diabetes and its complications. Electronic health records (EHRs) for each individual or a population have become important tools in understanding developing trends of diseases. Using EHRs to predict the onset of diabetes could improve the quality and efficiency of medical care. In this paper, we apply a wide and deep learning model that combines the strength of a generalised linear model with various features and a deep feed-forward neural network to improve the prediction of the onset of type 2 diabetes mellitus (T2DM). MATERIALS AND METHODS: The proposed method was implemented by training various models into a logistic loss function using a stochastic gradient descent. We applied this model using public hospital record data provided by the Practice Fusion EHRs for the United States population. The dataset consists of de-identified electronic health records for 9948 patients, of which 1904 have been diagnosed with T2DM. Prediction of diabetes in 2012 was based on data obtained from previous years (2009-2011). The imbalance class of the model was handled by Synthetic Minority Oversampling Technique (SMOTE) for each cross-validation training fold to analyse the performance when synthetic examples for the minority class are created. We used SMOTE of 150 and 300 percent, in which 300 percent means that three new synthetic instances are created for each minority class instance. This results in the approximated diabetes:non-diabetes distributions in the training set of 1:2 and 1:1, respectively. RESULTS: Our final ensemble model not using SMOTE obtained an accuracy of 84.28%, area under the receiver operating characteristic curve (AUC) of 84.13%, sensitivity of 31.17% and specificity of 96.85%. Using SMOTE of 150 and 300 percent did not improve AUC (83.33% and 82.12%, respectively) but increased sensitivity (49.40% and 71.57%, respectively) with a moderate decrease in specificity (90.16% and 76.59%, respectively). DISCUSSION AND CONCLUSIONS: Our algorithm has further optimised the prediction of diabetes onset using a novel state-of-the-art machine learning algorithm: the wide and deep learning neural network architecture.

Subject(s)

Deep Learning; Diabetes Mellitus, Type 2/diagnosis; Electronic Health Records; Humans; Machine Learning

Key words

Electronic health records; Incidence; Onset; Prediction; Type 2 diabetes mellitus; Wide and deep learning

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Diabetes Mellitus, Type 2 / Electronic Health Records / Deep Learning Type of study: Prognostic_studies / Risk_factors_studies Limits: Humans Language: En Journal: Comput Methods Programs Biomed Journal subject: INFORMATICA MEDICA Year: 2019 Document type: Article Country of publication: Ireland

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google