Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
J Agric Food Chem ; 72(14): 8060-8071, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38533667

ABSTRACT

Smoke taint in wine has become a critical issue in the wine industry due to its significant negative impact on wine quality. Data-driven approaches including univariate analysis and predictive modeling are applied to a data set containing concentrations of 20 VOCs in 48 grape samples and 56 corresponding wine samples with a taster-evaluated smoke taint index. The resulting models for predicting the smoke taint index of wines are highly predictive when using as inputs VOC concentrations after log conversion in both grapes and wines (Pearson Correlation Coefficient PCC = 0.82; R2 = 0.68) and less so when only grape VOCs are used (Pearson Correlation Coefficient PCC = 0.76; R2 = 0.56), and the classification models also show the capacity for detecting smoke-tainted wines using both wine and grape VOC concentrations (Recall = 0.76; Precision = 0.92; F1 = 0.82) or using only grape VOC concentrations (Recall = 0.74; Precision = 0.92; F1 = 0.80). The performance of the predictive model shows the possibility of predicting the smoke taint index of the wine and grape samples before fermentation. The corresponding code of data analysis and predictive modeling of smoke taint in wine is available in the Github repository (https://github.com/IBPA/smoke_taint_prediction).


Subject(s)
Vitis , Volatile Organic Compounds , Wine , Wine/analysis , Volatile Organic Compounds/analysis , Smoke/analysis , Fruit/chemistry , Nicotiana
2.
PLoS One ; 17(11): e0278121, 2022.
Article in English | MEDLINE | ID: mdl-36449508

ABSTRACT

Tax audits are a crucial process adopted in all tax departments to ensure tax compliance and fairness. Traditionally, tax audit leads have been selected based on empirical rules and randomization methods, which are not adaptive, may miss major cases and can introduce bias. Here, we present an audit lead tool based on artificial neural networks that have been trained and evaluated on an integrated dataset of 93,413 unique tax records from 8,647 restaurant businesses over 10 years in the Northern California, provided by the California Department of Tax and Fee Administration (CDTFA). The tool achieved a 40.1% precision and 58.7% recall (F1-score of 0.42) on classifying positive audit leads, and the corresponding regressor provided estimated audit gains (MAE of $155,490). Finally, we evaluated the statistical significance of various empirical rules for use in lead selection, with two out of five being supported by the data. This work demonstrates how data can be leveraged for creating evidence-based models of audit selection and validating empirical hypotheses, resulting in higher audit yields and more fair audit selection processes.


Subject(s)
Commerce , Neural Networks, Computer , Mental Recall , Restaurants
3.
Database (Oxford) ; 20202020 01 01.
Article in English | MEDLINE | ID: mdl-31976536

ABSTRACT

Breathomics is a special branch of metabolomics that quantifies volatile organic compounds (VOCs) from collected exhaled breath samples. Understanding how breath molecules are related to diseases, mechanisms and pathways identified from experimental analytical measurements is challenging due to the lack of an organized resource describing breath molecules, related references and biomedical information embedded in the literature. To provide breath VOCs, related references and biomedical information, we aim to organize a database composed of manually curated information and automatically extracted biomedical information. First, VOCs-related disease information was manually organized from 207 literature linked to 99 VOCs and known Medical Subject Headings (MeSH) terms. Then an automated text mining algorithm was used to extract biomedical information from this literature. In the end, the manually curated information and auto-extracted biomedical information was combined to form a breath molecule database-the Human Breathomics Database (HBDB). We first manually curated and organized disease information including MeSH term from 207 literatures associated with 99 VOCs. Then, an automatic pipeline of text mining approach was used to collect 2766 literatures and extract biomedical information from breath researches. We combined curated information with automatically extracted biomedical information to assemble a breath molecule database, the HBDB. The HBDB is a database that includes references, VOCs and diseases associated with human breathomics. Most of these VOCs were detected in human breath samples or exhaled breath condensate samples. So far, the database contains a total of 913 VOCs in relation to human exhaled breath researches reported in 2766 publications. The HBDB is the most comprehensive HBDB of VOCs in human exhaled breath to date. It is a useful and organized resource for researchers and clinicians to identify and further investigate potential biomarkers from the breath of patients. Database URL: https://hbdb.cmdm.tw.


Subject(s)
Database Management Systems , Exhalation/physiology , Metabolome/physiology , Metabolomics/methods , Volatile Organic Compounds , Breath Tests , Data Mining , Humans , Volatile Organic Compounds/analysis , Volatile Organic Compounds/chemistry
4.
Anal Chem ; 88(21): 10395-10403, 2016 11 01.
Article in English | MEDLINE | ID: mdl-27673369

ABSTRACT

Two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC/TOF-MS) is superior for chromatographic separation and provides great sensitivity for complex biological fluid analysis in metabolomics. However, GC×GC/TOF-MS data processing is currently limited to vendor software and typically requires several preprocessing steps. In this work, we implement a web-based platform, which we call GC2MS, to facilitate the application of recent advances in GC×GC/TOF-MS, especially for metabolomics studies. The core processing workflow of GC2MS consists of blob/peak detection, baseline correction, and blob alignment. GC2MS treats GC×GC/TOF-MS data as pictures and clusters the pixels as blobs according to the brightness of each pixel to generate a blob table. GC2MS then aligns the blobs of two GC×GC/TOF-MS data sets according to their distance and similarity. The blob distance and similarity are the Euclidean distance of the first and second retention times of two blobs and the Pearson's correlation coefficient of the two mass spectra, respectively. GC2MS also directly corrects the raw data baseline. The analytical performance of GC2MS was evaluated using GC×GC/TOF-MS data sets of Angelica sinensis compounds acquired under different experimental conditions and of human plasma samples. The results show that GC2MS is an easy-to-use tool for detecting peaks and correcting baselines, and GC2MS is able to align GC×GC/TOF-MS data sets acquired under different experimental conditions. GC2MS is freely accessible at http://gc2ms.web.cmdm.tw .


Subject(s)
Gas Chromatography-Mass Spectrometry/methods , Metabolomics/methods , Algorithms , Angelica sinensis/chemistry , Angelica sinensis/metabolism , Humans , Internet , Plasma/chemistry , Plasma/metabolism , Software , Workflow
SELECTION OF CITATIONS
SEARCH DETAIL
...