J Diabetes ; 2019 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-31290214


BACKGROUND: Unhealthy diet is one of the important risk factors of diabetes, which is one of the major public health problems in China. The Internet tools provide large-scale passively collected data that show people's dietary preferences and their relationship with diabetes risk. METHODS: 212 341 708 individuals' dietary preference labels were created based on Internet data from online search and shopping software. Metabolic data obtained from the 2010 China Noncommunicable Disease Surveillance, which had 98 658 participants, was used to estimate the relation between dietary preferences geographical distribution and diabetes risk. RESULTS: Chinese dietary preferences had different geographical distribution, which is related to the local climate and consumption level. Fried food preference proportion distribution was significantly positively correlated with diabetes prevalence, hypertension prevalence and body mass index (BMI). Similarly, grilled food preference proportion distribution had significantly positive correlation with the prevalence of diabetes and hypertension. In contrast, spicy food preference proportion distribution was negatively correlated with diabetes prevalence. Sweet food preference proportion distribution was positively related to diabetes prevalence. Using dietary preferences data to predict regional prevalence of diabetes, hypertension and BMI, the average values of error (95% CI) between the three paired predicted and observed values were 9.8% (6.9%-12.7%), 7.5% (5.0%-10.0%) and 1.6% (1.2%-2.0%), respectively. CONCLUSIONS: Fried food, grilled food, and sweet food preferences were positively related to diabetes risk whereas spicy food preference was negatively correlated with diabetes risk. Dietary preferences based on passively collected Internet data could be used to predict regional prevalence of diabetes, hypertension, and BMI and showed good value for public health monitoring.

IEEE Trans Pattern Anal Mach Intell ; 41(2): 323-336, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29994559


Online multiple-output regression is an important machine learning technique for modeling, predicting, and compressing multi-dimensional correlated data streams. In this paper, we propose a novel online multiple-output regression method, called MORES, for streaming data. MORES can dynamically learn the structure of the regression coefficients to facilitate the model's continuous refinement. Considering that limited expressive ability of regression models often leading to residual errors being dependent, MORES intends to dynamically learn and leverage the structure of the residual errors to improve the prediction accuracy. Moreover, we introduce three modified covariance matrices to extract necessary information from all the seen data for training, and set different weights on samples so as to track the data streams' evolving characteristics. Furthermore, an efficient algorithm is designed to optimize the proposed objective function, and an efficient online eigenvalue decomposition algorithm is developed for the modified covariance matrix. Finally, we analyze the convergence of MORES in certain ideal condition. Experiments on two synthetic datasets and three real-world datasets validate the effectiveness and efficiency of MORES. In addition, MORES can process at least 2,000 instances per second (including training and testing) on the three real-world datasets, more than 12 times faster than the state-of-the-art online learning algorithm.

IEEE Trans Neural Netw Learn Syst ; 27(12): 2768-2775, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-26863680


In this brief, we propose a new max-margin-based discriminative feature learning method. In particular, we aim at learning a low-dimensional feature representation, so as to maximize the global margin of the data and make the samples from the same class as close as possible. In order to enhance the robustness to noise, we leverage a regularization term to make the transformation matrix sparse in rows. In addition, we further learn and leverage the correlations among multiple categories for assisting in learning discriminative features. The experimental results demonstrate the power of the proposed method against the related state-of-the-art methods.

IEEE Trans Cybern ; 45(11): 2522-34, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26470062


In this paper, we propose a novel feature selection-based method for facial age estimation. The face aging is a typical temporal process, and facial images should have certain ordinal patterns in the aging feature space. From the geometrical perspective, a facial image can be usually seen as sampled from a low-dimensional manifold embedded in the original high-dimensional feature space. Thus, we first measure the energy of each feature in preserving the underlying local structure information and the ordinal information of the facial images, respectively, and then we intend to learn a low-dimensional aging representation that can maximally preserve both kinds of information. To further improve the performance, we try to eliminate the redundant local information and ordinal information as much as possible by minimizing nonlinear correlation and rank correlation among features. Finally, we formulate all these issues into a unified optimization problem, which is similar to linear discriminant analysis in format. Since it is expensive to collect the labeled facial aging images in practice, we extend the proposed supervised method to a semi-supervised learning mode including the semi-supervised feature selection method and the semi-supervised age prediction algorithm. Extensive experiments are conducted on the FACES dataset, the Images of Groups dataset, and the FG-NET aging dataset to show the power of the proposed algorithms, compared to the state-of-the-arts.

