ABSTRACT
AIMS: Machine learning models can use image and text data to predict the number of years since diabetes diagnosis; such model can be applied to new patients to predict, approximately, how long the new patient may have lived with diabetes unknowingly. We aimed to develop a model to predict self-reported diabetes duration. METHODS: We used the Brazilian Multilabel Ophthalmological Dataset. Unit of analysis was the fundus image and its meta-data, regardless of the patient. We included people 40â¯+ years and fundus images without diabetic retinopathy. Fundus images and meta-data (sex, age, comorbidities and taking insulin) were passed to the MedCLIP model to extract the embedding representation. The embedding representation was passed to an Extra Tree Classifier to predict: 0-4, 5-9, 10-14 and 15â¯+ years with self-reported diabetes. RESULTS: There were 988 images from 563 people (mean ageâ¯=â¯67 years; 64â¯% were women). Overall, the F1 score was 57â¯%. The group 15â¯+ years of self-reported diabetes had the highest precision (64â¯%) and F1 score (63â¯%), while the highest recall (69â¯%) was observed in the group 0-4 years. The proportion of correctly classified observations was 55â¯% for the group 0-4 years, 51â¯% for 5-9 years, 58â¯% for 10-14 years, and 64â¯% for 15â¯+ years with self-reported diabetes. CONCLUSIONS: The machine learning model had acceptable accuracy and F1 score, and correctly classified more than half of the patients according to diabetes duration. Using large foundational models to extract image and text embeddings seems a feasible and efficient approach to predict years living with self-reported diabetes.
Subject(s)
Diabetes Mellitus , Fundus Oculi , Machine Learning , Predictive Value of Tests , Self Report , Humans , Female , Male , Aged , Middle Aged , Time Factors , Diabetes Mellitus/diagnosis , Diabetes Mellitus/epidemiology , Brazil/epidemiology , Adult , Databases, Factual , Diabetic Retinopathy/diagnosis , Diabetic Retinopathy/epidemiology , Data Mining/methods , Reproducibility of Results , Image Interpretation, Computer-AssistedABSTRACT
OBJECTIVES: During the COVID-19 pandemic, convolutional neural networks (CNNs) have been used in clinical medicine (eg, X-rays classification). Whether CNNs could inform the epidemiology of COVID-19 classifying street images according to COVID-19 risk is unknown, yet it could pinpoint high-risk places and relevant features of the built environment. In a feasibility study, we trained CNNs to classify the area surrounding bus stops (Lima, Peru) into moderate or extreme COVID-19 risk. DESIGN: CNN analysis based on images from bus stops and the surrounding area. We used transfer learning and updated the output layer of five CNNs: NASNetLarge, InceptionResNetV2, Xception, ResNet152V2 and ResNet101V2. We chose the best performing CNN, which was further tuned. We used GradCam to understand the classification process. SETTING: Bus stops from Lima, Peru. We used five images per bus stop. PRIMARY AND SECONDARY OUTCOME MEASURES: Bus stop images were classified according to COVID-19 risk into two labels: moderate or extreme. RESULTS: NASNetLarge outperformed the other CNNs except in the recall metric for the moderate label and in the precision metric for the extreme label; the ResNet152V2 performed better in these two metrics (85% vs 76% and 63% vs 60%, respectively). The NASNetLarge was further tuned. The best recall (75%) and F1 score (65%) for the extreme label were reached with data augmentation techniques. Areas close to buildings or with people were often classified as extreme risk. CONCLUSIONS: This feasibility study showed that CNNs have the potential to classify street images according to levels of COVID-19 risk. In addition to applications in clinical medicine, CNNs and street images could advance the epidemiology of COVID-19 at the population level.
Subject(s)
COVID-19 , COVID-19/epidemiology , Feasibility Studies , Humans , Neural Networks, Computer , Pandemics , Peru/epidemiologyABSTRACT
Global targets to reduce salt intake have been proposed, but their monitoring is challenged by the lack of population-based data on salt consumption. We developed a machine learning (ML) model to predict salt consumption at the population level based on simple predictors and applied this model to national surveys in 54 countries. We used 21 surveys with spot urine samples for the ML model derivation and validation; we developed a supervised ML regression model based on sex, age, weight, height, and systolic and diastolic blood pressure. We applied the ML model to 54 new surveys to quantify the mean salt consumption in the population. The pooled dataset in which we developed the ML model included 49,776 people. Overall, there were no substantial differences between the observed and ML-predicted mean salt intake (p<0.001). The pooled dataset where we applied the ML model included 166,677 people; the predicted mean salt consumption ranged from 6.8 g/day (95% CI: 6.8-6.8 g/day) in Eritrea to 10.0 g/day (95% CI: 9.9-10.0 g/day) in American Samoa. The countries with the highest predicted mean salt intake were in the Western Pacific. The lowest predicted intake was found in Africa. The country-specific predicted mean salt intake was within reasonable difference from the best available evidence. An ML model based on readily available predictors estimated daily salt consumption with good accuracy. This model could be used to predict mean salt consumption in the general population where urine samples are not available.
Subject(s)
Machine Learning , Sodium Chloride, Dietary/urine , Blood Pressure , HumansABSTRACT
Nowadays, there is a great interest in developing accurate wireless indoor localization mechanisms enabling the implementation of many consumer-oriented services. Among the many proposals, wireless indoor localization mechanisms based on the Received Signal Strength Indication (RSSI) are being widely explored. Most studies have focused on the evaluation of the capabilities of different mobile device brands and wireless network technologies. Furthermore, different parameters and algorithms have been proposed as a means of improving the accuracy of wireless-based localization mechanisms. In this paper, we focus on the tuning of the RSSI fingerprint to be used in the implementation of a Bluetooth Low Energy 4.0 (BLE4.0) Bluetooth localization mechanism. Following a holistic approach, we start by assessing the capabilities of two Bluetooth sensor/receiver devices. We then evaluate the relevance of the RSSI fingerprint reported by each BLE4.0 beacon operating at various transmission power levels using feature selection techniques. Based on our findings, we use two classification algorithms in order to improve the setting of the transmission power levels of each of the BLE4.0 beacons. Our main findings show that our proposal can greatly improve the localization accuracy by setting a custom transmission power level for each BLE4.0 beacon.