Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Front Nutr ; 11: 1342823, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38595788

RESUMO

Introduction: In this research, we introduce the NutriGreen dataset, which is a collection of images representing branded food products aimed for training segmentation models for detecting various labels on food packaging. Each image in the dataset comes with three distinct labels: one indicating its nutritional quality using the Nutri-Score, another denoting whether it is vegan or vegetarian origin with the V-label, and a third displaying the EU organic certification (BIO) logo. Methods: To create the dataset, we have used semi-automatic annotation pipeline that combines domain expert annotation and automatic annotation using a deep learning model. Results: The dataset comprises a total of 10,472 images. Among these, the Nutri-Score label is distributed across five sub-labels: Nutri-Score grade A with 1,250 images, grade B with 1,107 images, grade C with 867 images, grade D with 1,001 images, and grade E with 967 images. Additionally, there are 870 images featuring the V-Label, 2,328 images showcasing the BIO label, and 3,201 images without before-mentioned labels. Furthermore, we have fine-tuned the YOLOv5 segmentation model to demonstrate the practicality of using these annotated datasets, achieving an impressive accuracy of 94.0%. Discussion: These promising results indicate that this dataset has significant potential for training innovative systems capable of detecting food labels. Moreover, it can serve as a valuable benchmark dataset for emerging computer vision systems.

2.
Artif Intell Med ; 142: 102586, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37316100

RESUMO

Nowadays, it is really important and crucial to follow the new biomedical knowledge that is presented in scientific literature. To this end, Information Extraction pipelines can help to automatically extract meaningful relations from textual data that further require additional checks by domain experts. In the last two decades, a lot of work has been performed for extracting relations between phenotype and health concepts, however, the relations with food entities which are one of the most important environmental concepts have never been explored. In this study, we propose FooDis, a novel Information Extraction pipeline that employs state-of-the-art approaches in Natural Language Processing to mine abstracts of biomedical scientific papers and automatically suggests potential cause or treat relations between food and disease entities in different existing semantic resources. A comparison with already known relations indicates that the relations predicted by our pipeline match for 90% of the food-disease pairs that are common in our results and the NutriChem database, and 93% of the common pairs in the DietRx platform. The comparison also shows that the FooDis pipeline can suggest relations with high precision. The FooDis pipeline can be further used to dynamically discover new relations between food and diseases that should be checked by domain experts and further used to populate some of the existing resources used by NutriChem and DietRx.


Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Bases de Dados Factuais , Fenótipo
3.
Sci Rep ; 13(1): 7815, 2023 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-37188766

RESUMO

Knowledge about the interactions between dietary and biomedical factors is scattered throughout uncountable research articles in an unstructured form (e.g., text, images, etc.) and requires automatic structuring so that it can be provided to medical professionals in a suitable format. Various biomedical knowledge graphs exist, however, they require further extension with relations between food and biomedical entities. In this study, we evaluate the performance of three state-of-the-art relation-mining pipelines (FooDis, FoodChem and ChemDis) which extract relations between food, chemical and disease entities from textual data. We perform two case studies, where relations were automatically extracted by the pipelines and validated by domain experts. The results show that the pipelines can extract relations with an average precision around 70%, making new discoveries available to domain experts with reduced human effort, since the domain experts should only evaluate the results, instead of finding, and reading all new scientific papers.


Assuntos
Mineração de Dados , Reconhecimento Automatizado de Padrão , Humanos , Mineração de Dados/métodos , Idioma , Processamento de Linguagem Natural
4.
Database (Oxford) ; 20222022 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-36526439

RESUMO

In the last decades, a great amount of work has been done in predictive modeling of issues related to human and environmental health. Resolution of issues related to healthcare is made possible by the existence of several biomedical vocabularies and standards, which play a crucial role in understanding the health information, together with a large amount of health-related data. However, despite a large number of available resources and work done in the health and environmental domains, there is a lack of semantic resources that can be utilized in the food and nutrition domain, as well as their interconnections. For this purpose, in a European Food Safety Authority-funded project CAFETERIA, we have developed the first annotated corpus of 500 scientific abstracts that consists of 6407 annotated food entities with regard to Hansard taxonomy, 4299 for FoodOn and 3623 for SNOMED-CT. The CafeteriaSA corpus will enable the further development of natural language processing methods for food information extraction from textual data that will allow extracting food information from scientific textual data. Database URL: https://zenodo.org/record/6683798#.Y49wIezMJJF.


Assuntos
Processamento de Linguagem Natural , Semântica , Humanos , Armazenamento e Recuperação da Informação , Bases de Dados Factuais
5.
Foods ; 11(17)2022 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-36076868

RESUMO

Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources-Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities-recipes-which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating-the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data-recipes-annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications.

6.
Expert Syst Appl ; 209: 118377, 2022 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-35945970

RESUMO

Many factors significantly influence the outcomes of infectious diseases such as COVID-19. A significant focus needs to be put on dietary habits as environmental factors since it has been deemed that imbalanced diets contribute to chronic diseases. However, not enough effort has been made in order to assess these relations. So far, studies in the field have shown that comorbid conditions influence the severity of COVID-19 symptoms in infected patients. Furthermore, COVID-19 has exhibited seasonal patterns in its spread; therefore, considering weather-related factors in the analysis of the mortality rates might introduce a more relevant explanation of the disease's progression. In this work, we provide an explainable analysis of the global risk factors for COVID-19 mortality on a national scale, considering dietary habits fused with data on past comorbidity prevalence and environmental factors such as seasonally averaged temperature geolocation, economic and development indices, undernourished and obesity rates. The innovation in this paper lies in the explainability of the obtained results and is equally essential in the data fusion methods and the broad context considered in the analysis. Apart from a country's age and gender distribution, which has already been proven to influence COVID-19 mortality rates, our empirical analysis shows that countries with imbalanced dietary habits generally tend to have higher COVID-19 mortality predictions. Ultimately, we show that the fusion of the dietary data set with the geo-economic variables provides more accurate modeling of the country-wise COVID-19 mortality rates with respect to considering only dietary habits, proving the hypothesis that fusing factors from different contexts contribute to a better descriptive analysis of the COVID-19 mortality rates.

7.
Sci Rep ; 12(1): 6508, 2022 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-35444165

RESUMO

Alzheimer's disease is still a field of research with lots of open questions. The complexity of the disease prevents the early diagnosis before visible symptoms regarding the individual's cognitive capabilities occur. This research presents an in-depth analysis of a huge data set encompassing medical, cognitive and lifestyle's measurements from more than 12,000 individuals. Several hypothesis were established whose validity has been questioned considering the obtained results. The importance of appropriate experimental design is highly stressed in the research. Thus, a sequence of methods for handling missing data, redundancy, data imbalance, and correlation analysis have been applied for appropriate preprocessing of the data set, and consequently XGBoost model has been trained and evaluated with special attention to the hyperparameters tuning. The model was explained by using the Shapley values produced by the SHAP method. XGBoost produced a f1-score of 0.84 and as such is considered to be highly competitive among those published in the literature. This achievement, however, was not the main contribution of this paper. This research's goal was to perform global and local interpretability of the intelligent model and derive valuable conclusions over the established hypothesis. Those methods led to a single scheme which presents either positive, or, negative influence of the values of each of the features whose importance has been confirmed by means of Shapley values. This scheme might be considered as additional source of knowledge for the physicians and other experts whose concern is the exact diagnosis of early stage of Alzheimer's disease. The conclusions derived from the intelligent model's data-driven interpretability confronted all the established hypotheses. This research clearly showed the importance of explainable Machine learning approach that opens the black box and clearly unveils the relationships among the features and the diagnoses.


Assuntos
Doença de Alzheimer , Doença de Alzheimer/diagnóstico , Humanos , Aprendizado de Máquina , Resolução de Problemas
8.
J Med Internet Res ; 23(8): e28229, 2021 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-34383671

RESUMO

BACKGROUND: Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources. OBJECTIVE: In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction. METHODS: We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags. RESULTS: All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%. CONCLUSIONS: FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.


Assuntos
Algoritmos , Processamento de Linguagem Natural , Humanos , Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Semântica
9.
Food Qual Prefer ; 93: 104231, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-36569642

RESUMO

We aimed to evaluate the changes in eating behaviours of the adult population across 16 European countries due to the COVID-19 confinement and to evaluate whether these changes were somehow related to the severity of the containment measures applied in each country. An anonymous online self-reported questionnaire on socio-demographic characteristics, validated 14-items Mediterranean diet (MedDiet) Adherence Screener (MEDAS) as a reference of a healthy diet, eating and lifestyle behaviours prior to and during the COVID-19 confinement was used to collect data. The study included an adult population residing in 16 European countries at the time of the survey. Aggregated Stringency Index (SI) score, based on data from the Oxford COVID-19 Government Response Tracker, was calculated for each country at the time the questionnaire was distributed (range: 0-100). A total of 36,185 participants completed the questionnaire (77.6% female, 75.2% with high educational level and 42.7% aged between 21 and 35 years). In comparison to pre-confinement, a significantly higher adherence to the MedDiet during the confinement was observed across all countries (overall MEDAS score prior to- and during confinement: 5.23 ± 2.06 vs. 6.15 ± 2.06; p < 0.001), with the largest increase seen in Greece and North Macedonia. The highest adherence to MedDiet during confinement was found in Spain and Portugal (7.18 ± 1.84 and 7.34 ± 1.95, respectively). Stricter contingency restrictions seemed to lead to a significantly higher increase in the adherence to the MedDiet. The findings from this cross-sectional study could be used to inform current diet-related public health guidelines to ensure optimal nutrition is followed among the population, which in turn would help to alleviate the current public health crisis.

10.
Front Nutr ; 8: 795802, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35402471

RESUMO

The focus of the current paper is on a design of responsible governance of food consumer science e-infrastructure using the case study Determinants and Intake Data Platform (DI Data Platform). One of the key challenges for implementation of the DI Data Platform is how to develop responsible governance that observes the ethical and legal frameworks of big data research and innovation, whilst simultaneously capitalizing on huge opportunities offered by open science and the use of big data in food consumer science research. We address this challenge with a specific focus on four key governance considerations: data type and technology; data ownership and intellectual property; data privacy and security; and institutional arrangements for ethical governance. The paper concludes with a set of responsible research governance principles that can inform the implementation of DI Data Platform, and in particular: consider both individual and group privacy; monitor the power and control (e.g., between the scientist and the research participant) in the process of research; question the veracity of new knowledge based on big data analytics; understand the diverse interpretations of scientists' responsibility across different jurisdictions.

11.
Nutrients ; 12(12)2020 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-33321959

RESUMO

Food frequency questionnaires (FFQs) are the most commonly selected tools in nutrition monitoring, as they are inexpensive, easily implemented and provide useful information regarding dietary intake. They are usually carefully drafted by experts from nutritional and/or medical fields and can be validated by using other dietary monitoring techniques. FFQs can get very extensive, which could indicate that some of the questions are less significant than others and could be omitted without losing too much information. In this paper, machine learning is used to explore how reducing the number of questions affects the predicted nutrient values and diet quality score. The paper addresses the problem of removing redundant questions and finding the best subset of questions in the Extended Short Form Food Frequency Questionnaire (ESFFFQ), developed as part of the H2020 project WellCo. Eight common machine-learning algorithms were compared on different subsets of questions by using the PROMETHEE method, which compares methods and subsets via multiple performance measures. According to the results, for some of the targets, specifically sugar intake, fiber intake and protein intake, a smaller subset of questions are sufficient to predict diet quality scores. Additionally, for smaller subsets of questions, machine-learning algorithms generally perform better than statistical methods for predicting intake and diet quality scores. The proposed method could therefore be useful for finding the most informative subsets of questions in other FFQs as well. This could help experts develop FFQs that provide the necessary information and are not overbearing for those answering.


Assuntos
Inquéritos sobre Dietas/métodos , Dieta Saudável/estatística & dados numéricos , Dieta/estatística & dados numéricos , Aprendizado de Máquina , Inquéritos e Questionários/normas , Adulto , Regras de Decisão Clínica , Inquéritos sobre Dietas/normas , Feminino , Humanos , Masculino , Avaliação Nutricional , Valor Preditivo dos Testes , Análise de Regressão , Reprodutibilidade dos Testes
12.
Trends Food Sci Technol ; 104: 268-272, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32905099

RESUMO

BACKGROUND: The COVID-19 pandemic affects all aspects of human life including their food consumption. The changes in the food production and supply processes introduce changes to the global dietary patterns. SCOPE AND APPROACH: To study the COVID-19 impact on food consumption process, we have analyzed two data sets that consist of food preparation recipes published before (69,444) and during the quarantine (10,009) period. Since working with large data sets is a time-consuming task, we have applied a recently proposed artificial intelligence approach called DietHub. The approach uses the recipe preparation description (i.e. text) and automatically provides a list of main ingredients annotated using the Hansard semantic tags. After extracting the semantic tags of the ingredients for every recipe, we have compared the food consumption patterns between the two data sets by comparing the relative frequency of the ingredients that compose the recipes. KEY FINDINGS AND CONCLUSIONS: Using the AI methodology, the changes in the food consumption patterns before and during the COVID-19 pandemic are obvious. The highest positive difference in the food consumption can be found in foods such as "Pulses/ plants producing pulses", "Pancake/Tortilla/Outcake", and "Soup/pottage", which increase by 300%, 280%, and 100%, respectively. Conversely, the largest decrease in consumption can be food for food such as "Order Perciformes (type of fish)", "Corn/cereals/grain", and "Wine-making", with a reduction of 50%, 40%, and 30%, respectively. This kind of analysis is valuable in times of crisis and emergencies, which is a very good example of the scientific support that regulators require in order to take quick and appropriate response.

13.
Molecules ; 25(12)2020 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-32586041

RESUMO

This study examined the percentage and stable isotope ratios of fatty acids in milk to study seasonal, year, and regional variability. A total of 231 raw cow milk samples were analyzed. Samples were taken twice per year in 2012, 2013, and 2014, in winter and summer, covering four distinct geographical regions in Slovenia: Mediterranean, Alpine, Dinaric, and Pannonian. A discriminant analysis model based on fatty acid composition was effective in discriminating milk according to the year/season of production (86.9%), while geographical origin discrimination was less successful (64.1%). The stable isotope composition of fatty acids also proved to be a better biomarker of metabolic transformation processes in ruminants than discriminating against the origin of milk. Further, it was observed that milk from Alpine and Mediterranean regions was healthier due to its higher percentage of ω-3 polyunsaturated fatty acid and conjugated linoleic acid.


Assuntos
Isótopos de Carbono/análise , Ácidos Graxos/análise , Geografia , Leite/química , Estações do Ano , Animais , Bovinos , Análise por Conglomerados , Análise Discriminante , Eslovênia
14.
Food Chem Toxicol ; 141: 111368, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32380076

RESUMO

Missing data are a common problem in most research fields and introduce an element of ambiguity into data analysis. They can arise due to different reasons: mishandling of samples, measurement error, deleted aberrant value or simply lack of analysis. The nutrition domain is no exception to the problem of missing data. This paper addresses the problem of missing data in food composition databases (FCDBs). Missing data in FCDBs results in incomplete FCDBs, which have limited usage, because any dietary assessment can be performed only on a complete dataset. Most often, this problem is resolved by calculating means/medians from excising data in the same database or borrowing data from other FCDBs. These solutions introduce significant error. We focus on missing data imputation techniques based on methods for substituting missing values with statistical prediction: Non-Negative Matrix Factorization (NMF), Multiple Imputations by Chained Equations (MICE), Nonparametric Missing Value Imputation using Random Forest (MissForest), and K-Nearest Neighbors (KNN), and compared them with commonly used approaches - fill-in with mean, fill-in with median. The data used was from national FCDBs collected by EuroFIR (European Food Information Resource Network). The results show that the state-of-the-art methods for imputation yield better results than the traditional approaches.


Assuntos
Interpretação Estatística de Dados , Sistemas de Gerenciamento de Base de Dados , Análise de Alimentos , Algoritmos , Valor Nutritivo
15.
Food Chem ; 326: 126958, 2020 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-32416418

RESUMO

This work examines the use of stable isotopes and elemental composition for determining geographical origin and authenticity of cow milk from four geographical regions of Slovenian. Samples (277) were collected during summer and winter (2012-2014). It was possible to discriminate milk samples according to the year, season and production region using discriminant analysis (DA). The overall temporal prediction variability was 84.6% and 56.4% for regional differences. It was also possible to discriminate milk from three geographic regions, although Alpine samples overlap with Dinaric and Pannonian ones. Prediction ability was the highest for the Pannonian (82.1%) and lowest (26.9%) for the Alpine region. Pairwise comparison using OPLS-DA also displaying good regional predictability (≥0.77) with δ13Ccas values and Br content carrying the most variance. A model based on DD-SIMCA was also developed and applied to the control of Slovenian milk. The results revealed the mislabeling of three Slovenian milk products.


Assuntos
Isótopos/análise , Leite/química , Animais , Isótopos de Carbono/análise , Bovinos , Análise Discriminante , Feminino , Geografia , Isótopos de Nitrogênio/análise , Estações do Ano , Eslováquia
16.
Food Chem Toxicol ; 138: 111169, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32088249

RESUMO

In food and toxicology science, a huge amount of research and other data has been collected. To enable its full utilization, advanced statistical and computer methods are required. All data is related to food items, but additionally include different kinds of information. Nowadays the consumption of avocado has increased. To understand the full impact of this increased consumption on public health and the environment, different data related to avocado need to be considered. In this paper, we present an approach for representing foods in the form of vectors of continuous numbers (food embeddings) as an alternative solution to manual indexing. The utility of representing food data as a vector of continuous numbers was evaluated and demonstrated in four tasks: i) automated determination of different food groups, ii) automated detection of the food class for each food concept (raw, derivative or composite), iii) identification of most similar food concepts for a given food concept, and iv) qualitative evaluation by a food expert. The experimental results showed that these kind of vector representations outperform the traditional representational methods used for food data analysis, and thus they present a step forward to more advanced food data analysis used for discovering new knowledge.


Assuntos
Análise de Dados , Bases de Dados Factuais , Alimentos/classificação , Paladar , Terminologia como Assunto
17.
Environ Res ; 180: 108820, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31639654

RESUMO

The maternal diet and living environment can affect levels of chemical elements and fatty acid (FA) composition and their stable isotopes (δ13CFA) in human milk. Information obtained from questionnaires is frequently imprecise, thus limiting proper associations between external and internal exposures as well as health effects. In this study, we focused on seafood as a source of potentially toxic and essential elements and nutritional FAs. Concentrations of selected elements in human milk (As, Cd, Cu, Mn, Pb, Se and Zn) were determined using inductively coupled plasma mass spectrometry (ICP-MS) and Hg using cold vapour atomic-absorption spectrometry (CV-AAS). The identification and quantification of FAs in maternal milk were performed by an in-situ trans-esterification method (FAMEs), and the characterization of FAMEs was performed by gas chromatography with a flame ionisation detector (GC-FID). δ13CFA was determined by gas chromatography-combustion-isotope ratio mass spectrometry (GC-C-IRMS). Seventy-four lactating Slovenian women from the coastal area of Koper (KP), with more frequent consumption of seafood, and the inland area of Pomurje (MS), with less frequent seafood consumption, were included in this study. Along with basic statistical analyses, data mining approaches (classification and clustering) were applied to investigate whether FA composition and δ13CFA could improve the information regarding dietary sources of potentially toxic elements. As and Hg levels in milk were found to be statistically higher in populations from KP than in those from MS, and 71% of individual FAs and 30% of individual δ13CFA values in milk differed statistically between the studied areas. In 19 cases, the levels of FAs in milk were higher in KP than in MS; these FAs include C20:5ω3 and C22:6ω3/C24:1ω9, which are typically contained in fish. In 16 cases, the mean percentage of FAs was higher in MS than in KP; these FAs include the PUFAs C18:2ω6, C18:3ω3, and C20:4ω6 which are important for human and infant growth. The difference in δ13C levels of C10:0, C12:0, C14:0, C16:1, C16:0, C18:1ω9c, C22:6ω3, and δ13C 18:0-16:0 in the study groups was statistically significant. In all seven cases where δ13C of FA significantly differed between KP and MS, δ13C was higher in KP, indicating a higher proportion of a marine-based diet. The data mining approaches confirmed that the percentage of selected FAs (iC17:0, C4:0, C18:2ω6t, aC17:0, CLA, and C22:4ω6) and δ13CFA of C18:1ω9c in human milk could be used to distinguish between high and low frequency of fresh seafood consumption.


Assuntos
Exposição Dietética/estatística & dados numéricos , Lactação , Exposição Materna/estatística & dados numéricos , Leite Humano , Alimentos Marinhos/estatística & dados numéricos , Animais , Ácidos Graxos , Comportamento Alimentar , Feminino , Cromatografia Gasosa-Espectrometria de Massas , Humanos , Leite
18.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31682732

RESUMO

The existence of annotated text corpora is essential for the development of public health services and tools based on natural language processing (NLP) and text mining. Recently organized biomedical NLP shared tasks have provided annotated corpora related to different biomedical entities such as genes, phenotypes, drugs, diseases and chemical entities. These are needed to develop named-entity recognition (NER) models that are used for extracting entities from text and finding their relations. However, to the best of our knowledge, there are limited annotated corpora that provide information about food entities despite food and dietary management being an essential public health issue. Hence, we developed a new annotated corpus of food entities, named FoodBase. It was constructed using recipes extracted from Allrecipes, which is currently the largest food-focused social network. The recipes were selected from five categories: 'Appetizers and Snacks', 'Breakfast and Lunch', 'Dessert', 'Dinner' and 'Drinks'. Semantic tags used for annotating food entities were selected from the Hansard corpus. To extract and annotate food entities, we applied a rule-based food NER method called FoodIE. Since FoodIE provides a weakly annotated corpus, by manually evaluating the obtained results on 1000 recipes, we created a gold standard of FoodBase. It consists of 12 844 food entity annotations describing 2105 unique food entities. Additionally, we provided a weakly annotated corpus on an additional 21 790 recipes. It consists of 274 053 food entity annotations, 13 079 of which are unique. The FoodBase corpus is necessary for developing corpus-based NER models for food science, as a new benchmark dataset for machine learning tasks such as multi-class classification, multi-label classification and hierarchical multi-label classification. FoodBase can be used for detecting semantic differences/similarities between food concepts, and after all we believe that it will open a new path for learning food embedding space that can be used in predictive studies.


Assuntos
Culinária , Curadoria de Dados , Bases de Dados Factuais , Alimentos , Processamento de Linguagem Natural , Humanos
19.
Public Health Nutr ; 22(7): 1193-1202, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-29623869

RESUMO

OBJECTIVE: The present study tested the combination of an established and a validated food-choice research method (the 'fake food buffet') with a new food-matching technology to automate the data collection and analysis. DESIGN: The methodology combines fake-food image recognition using deep learning and food matching and standardization based on natural language processing. The former is specific because it uses a single deep learning network to perform both the segmentation and the classification at the pixel level of the image. To assess its performance, measures based on the standard pixel accuracy and Intersection over Union were applied. Food matching firstly describes each of the recognized food items in the image and then matches the food items with their compositional data, considering both their food names and their descriptors. RESULTS: The final accuracy of the deep learning model trained on fake-food images acquired by 124 study participants and providing fifty-five food classes was 92·18 %, while the food matching was performed with a classification accuracy of 93 %. CONCLUSIONS: The present findings are a step towards automating dietary assessment and food-choice research. The methodology outperforms other approaches in pixel accuracy, and since it is the first automatic solution for recognizing the images of fake foods, the results could be used as a baseline for possible future studies. As the approach enables a semi-automatic description of recognized food items (e.g. with respect to FoodEx2), these can be linked to any food composition database that applies the same classification and description system.


Assuntos
Aprendizado Profundo , Registros de Dieta , Processamento de Imagem Assistida por Computador , Processamento de Linguagem Natural , Algoritmos , Preferências Alimentares , Humanos , Avaliação Nutricional
20.
Food Chem ; 277: 382-390, 2019 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-30502161

RESUMO

To link and harmonize different knowledge repositories with respect to isotopic data, we propose an ISO-FOOD ontology as a domain ontology for describing isotopic data within Food Science. The ISO-FOOD ontology consists of metadata and provenance data that needs to be stored together with data elements in order to describe isotopic measurements with all necessary information required for future analysis. The new domain has been linked with existing ontologies, such as Units of Measurements Ontology, Food, Nutrient and the Bibliographic Ontology. To show how such an ontology can be used in practise, it was populated with 20 isotopic measurements of Slovenian food samples. Describing data in this way offers a powerful technique for organizing and sharing stable isotope data across Food Science.


Assuntos
Bases de Dados Factuais , Tecnologia de Alimentos , Isótopos/classificação , Vocabulário Controlado , Isótopos/química , Metadados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA