Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
1.
Spectrochim Acta A Mol Biomol Spectrosc ; 311: 123976, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38330764

ABSTRACT

Starch is the main source of energy and nutrition. Therefore, some merchants often illegally add cheaper starches to other types of starches or package cheaper starches as higher priced starches to raise the price. In this study, 159 samples of commercially available wheat starch, potato starch, corn starch and sweet potato starch were selected for the identification and classification based on multispectral techniques, including near-infrared (NIR), mid-infrared (MIR) and Raman spectroscopy combined with chemometrics, including pretreatment methods, characteristic wavelength selection methods and classification algorithms. The results indicate that all three spectral techniques can be used to discriminate starch types. The Raman spectroscopy demonstrated superior performance compared to that of NIR and MIR spectroscopy. The accuracy of the models after characteristic wavelength selection is generally superior to that of the full spectrum, and two-dimensional correlation spectroscopy (2D-COS) achieves better model performance than other wavelength selection methods. Among the four classification methods, convolutional neural network (CNN) exhibited the best prediction performance, achieving accuracies of 99.74 %, 97.57 % and 98.65 % in NIR, MIR and Raman spectra, respectively, without pretreatment or characteristic wavelength selection.


Subject(s)
Spectroscopy, Near-Infrared , Starch , Spectroscopy, Near-Infrared/methods , Starch/chemistry , Chemometrics , Spectrum Analysis, Raman , Algorithms
2.
Food Chem X ; 21: 101141, 2024 Mar 30.
Article in English | MEDLINE | ID: mdl-38304045

ABSTRACT

Aroma is a key criterion in evaluating aromatic coconut water. A comparison regarding key aroma compounds and sensory correlations was made between Thailand Aromatic Green Dwarf (THD) and Cocos nucifera L. cv. Wenye No. 4 coconut water using E-nose and GC × GC-O-TOF-MS combined with chemometrics. Twenty-one volatile components of coconut water were identified by GC × GC-O-TOF-MS, and 5 key aroma compounds were analyzed by relative odor activity value and aroma extract dilution analysis. Moreover, the combination of the E-nose with orthogonal partial least squares was highly effective in discriminating between the two coconut water samples and screened the key sensors responsible for this differentiation. Additionally, the correlation between volatile compounds and sensory properties was established using partial least squares. The key aroma compounds of coconut water exhibited positive correlations with the corresponding sensory properties.

3.
Metabolites ; 13(8)2023 Jul 26.
Article in English | MEDLINE | ID: mdl-37623830

ABSTRACT

Dendrobium officinale (D. officinale) is a precious medicinal species of Dendrobium Orchidaceae, and the product obtained by hot processing is called "Fengdou". At present, the research on the processing quality of D. officinale mainly focuses on the chemical composition indicators such as polysaccharides and flavonoids content. However, the changes in metabolites during D. officinale processing are still unclear. In this study, the process was divided into two stages and three important conditions including fresh stems, semiproducts and "Fengdou" products. To investigate the effect of processing on metabolites of D. officinale in different processing stages, an approach of combining metabolomics with network pharmacology and molecular docking was employed. Through UPLC-MS/MS analysis, a total of 628 metabolites were detected, and 109 of them were identified as differential metabolites (VIP ≥ 1, |log2 (FC)| ≥ 1). Next, the differential metabolites were analyzed using the network pharmacology method, resulting in the selection of 29 differential metabolites as they have a potential pharmacological activity. Combining seven diseases, 14 key metabolites and nine important targets were screened by constructing a metabolite-target-disease network. The results showed that seven metabolites with potential anticoagulant, hypoglycemic and tumor-inhibiting activities increased in relative abundance in the "Fengdou" product. Molecular docking results indicated that seven metabolites may act on five important targets. In general, processing can increase the content of some active metabolites of D. officinale and improve its medicinal quality to a certain extent.

4.
Food Chem X ; 18: 100745, 2023 Jun 30.
Article in English | MEDLINE | ID: mdl-37397224

ABSTRACT

Sesame oil has a unique flavor and is very popular in Asian countries, and this leads to frequent adulteration. In this study, comprehensive adulteration detection of sesame oil based on characteristic markers was developed. Initially, sixteen fatty acids, eight phytosterols, and four tocopherols were utilized to construct an adulteration detection model, which screened seven potentially adulterated samples. Subsequently, confirmatory conclusions were drawn based on the characteristic markers. Adulteration with rapeseed oil in 4 samples was confirmed using the characteristic marker of brassicasterol. The adulteration of soybean oil in 1 sample was confirmed using the isoflavone. The adulteration of 2 samples with cottonseed oil was demonstrated by sterculic acid and malvalic acid. The results showed that sesame oil adulteration could be detected by screening positive samples using chemometrics and verifying with characteristic markers. The comprehensive adulteration detection method could provide a system approach for market supervision of edible oils.

5.
Foods ; 12(12)2023 Jun 20.
Article in English | MEDLINE | ID: mdl-37372626

ABSTRACT

Coconut water (CW) is a popular and healthful beverage, and ensuring its quality is crucial for consumer satisfaction. This study aimed to explore the potential of near-infrared spectroscopy (NIRS) and chemometric methods for analyzing CW quality and distinguishing samples based on postharvest storage time, cultivar, and maturity. CW from nuts of Wenye No. 2 and Wenye No. 4 cultivars in China, with varying postharvest storage time and maturities, were subjected to NIRS analysis. Partial least squares regression (PLSR) models were developed to predict reducing sugar and soluble sugar contents, revealing moderate applicability but lacking accuracy, with the residual prediction deviation (RPD) values ranging from 1.54 to 1.83. Models for TSS, pH, and TSS/pH exhibited poor performance with RPD values below 1.4, indicating limited predictability. However, the study achieved a total correct classification rate exceeding 95% through orthogonal partial least squares discriminant analysis (OPLS-DA) models, effectively discriminating CW samples based on postharvest storage time, cultivar, and maturity. These findings highlight the potential of NIRS combined with appropriate chemometric methods as a valuable tool for analyzing CW quality and efficiently distinguishing samples. NIRS and chemometric techniques enhance quality control in coconut water, ensuring consumer satisfaction and product integrity.

6.
Biosensors (Basel) ; 13(5)2023 May 06.
Article in English | MEDLINE | ID: mdl-37232882

ABSTRACT

Food analysis plays a vital role in ensuring the safety and quality of food products [...].


Subject(s)
Biosensing Techniques , Food Analysis , Technology
7.
Int J Food Microbiol ; 379: 109846, 2022 Oct 16.
Article in English | MEDLINE | ID: mdl-35908494

ABSTRACT

Pseudomonas fragi is primarily responsible for the spoilage of various foods, especially meat. The aim of this study was to investigate the antibacterial mechanism of 3-carene against P. fragi. 3-Carene treatment decreased the phospholipid content and the fluidity of the cell membrane, induced reactive oxygen species (ROS) generation and affected respiratory chain dehydrogenase, oxoglutarate dehydrogenase and citrate synthase in P. fragi. Metabolomics and proteomics analyses further showed that in the presence of 3-carene, 519 proteins, 136 metabolites in positive ion mode and 100 metabolites in negative ion mode were differentially expressed. These proteins and metabolites were primarily involved in amino acid metabolism, fatty acid degradation, the tricarboxylic acid cycle (TCA cycle) and other processes. Consequently, the stimulation of 3-carene altered cell membrane properties, disturbed important amino acid and energy metabolism, and even caused oxidative stress. Additionally, the results of total viable counts and the total volatile base nitrogen indicated that 3-carene could significantly improve the preservation of refrigerated pork. This study suggested that 3-carene has promising potential to be developed as a food preservative.


Subject(s)
Pork Meat , Pseudomonas fragi , Red Meat , Amino Acids/metabolism , Animals , Anti-Bacterial Agents/metabolism , Anti-Bacterial Agents/pharmacology , Bicyclic Monoterpenes , Metabolomics , Proteomics , Pseudomonas fragi/metabolism , Red Meat/microbiology , Swine
8.
Foods ; 10(9)2021 Sep 09.
Article in English | MEDLINE | ID: mdl-34574244

ABSTRACT

Partridge tea (Mallotus oblongifolius (Miq.) Müll.Arg.) is a local characteristic tea in Hainan, the southernmost province of China, and the quality of partridge tea may be affected by the producing areas. In this study, stable isotope and targeted metabolomics combined chemometrics were used as potential tools for analyzing and identifying partridge tea from different origins. Elemental analysis-stable isotope ratio mass spectrometer and liquid chromatography-tandem mass spectrometrywas used to analyze the characteristics of C/N/O/H stable isotopes and 54 chemical components, including polyphenols and alkaloids in partridge tea samples from four regions in Hainan (Wanning, Wenchang, Sanya and Baoting). The results showed that there were significant differences in the stable isotope ratios and polyphenol and alkaloid contents of partridge tea from different origins, and both could accurately classify partridge tea from different origins. The correct separation and clustering of the samples were observed by principal component analysis and the cross-validated Q2 values by orthogonal partial least squares discriminant analysis (OPLS-DA) were 0.949 (based on stable isotope) and 0.974 (based on polyphenol and alkaloid), respectively. Potential significance indicators for origin identification were screened out by OPLS-DA and random forest algorithm, including three stable isotopes (δ13C, δ D, and δ18O) and four polyphenols (luteolin, protocatechuic acid, astragalin, and naringenin). This study can provide a preliminary guide for the origin identification of Hainan partridge tea.

9.
Curr Microbiol ; 78(5): 1730-1740, 2021 May.
Article in English | MEDLINE | ID: mdl-33704531

ABSTRACT

Washing rice water (WRW) refers to the sewage produced by rice washing in China and other parts of Asia people's daily life. As in the WRW is rich a variety of nutrients, microorganisms are prone to multiply and pollute the environment. In this article, high-throughput sequencing is used to describe the microbial diversity in different fermentation time WRW. The results showed that the sequencing depth effectively covered the microbial species in the samples, and the bacterial community structure in the samples of WRW at different fermentation periods was rich in diversity. Preominant taxa included Proteobacteria (62%), Firmicutes (28%), approximately Cyanobacteria (10%) and Bacteroidetes (0.5%). The core WRW microbiome comprises Trabulsiella, Pseudomonas, Serratia, Lactobacillus, Erwinia, Enterobacter, Clostridium and Acinetobacter, some of which are potential beneficial microbes. The change of microbial community composition with the change of habitat was assessed. It was found that environmental factors had significant influence on the assembly structure of microbial community.


Subject(s)
Microbiota , Oryza , China , High-Throughput Nucleotide Sequencing , Humans , Water
10.
Food Chem ; 348: 129129, 2021 Jun 30.
Article in English | MEDLINE | ID: mdl-33515952

ABSTRACT

The potential of two different hyperspectral imaging systems (visible near infrared spectroscopy (Vis-NIR) and NIR) was investigated to determine TVB-N contents in tilapia fillets during cold storage. With Vis-NIR and NIR data, calibration models were established between the average spectra of tilapia fillets in the hyperspectral image and their corresponding TVB-N contents and optimized with various variable selection and data fusion methods. Superior models were obtained with variable selection methods based on low-level fusion data when compared with the corresponding methods based on single data blocks. Mid-level fusion data achieved the best model based on CARS, in comparison with all others. Finally, the respective optimized models of single Vis-NIR and NIR data were employed to visualize TVB-N contents distribution in tilapia fillets. In general, the results showed the great feasibility of hyperspectral imaging in combination with data fusion analysis in the nondestructive evaluation of tilapia fillet freshness.


Subject(s)
Hyperspectral Imaging/methods , Seafood/analysis , Animals , Image Processing, Computer-Assisted , Spectroscopy, Near-Infrared , Tilapia/metabolism
11.
Foods ; 11(1)2021 Dec 30.
Article in English | MEDLINE | ID: mdl-35010218

ABSTRACT

Pseudomonas lundensis is the main bacterium responsible for meat spoilage and its control is of great significance. 3-Carene, a natural monoterpene, has been proved to possess antimicrobial activities. This study aimed to investigate the antibacterial activity and mechanism of 3-carene against the meat spoilage bacterium P. lundensis, and explore its application on pork. After 3-carene treatment, cellular structural changes were observed. Cell walls and membranes were destroyed, resulting in the leakage of alkaline phosphatase and cellular contents. The decreased activity of Ca2+-Mg2+-ATPase and Na+-K+-ATPase showed the imbalance of intracellular ions. Subsequently, adenosine triphosphate (ATP) content and oxidative respiratory metabolism characteristics indicated that 3-carene inhibited the metabolism of the tricarboxylic acid cycle in P. lundensis. The results of binding 3-carene with the vital proteins (MurA, OmpW, and AtpD) related to the formation of the cell wall, the composition of the cell membrane, and the synthesis of ATP further suggested that 3-carene possibly affected the normal function of those proteins. In addition, the growth of P. lundensis and increase in pH were inhibited in pork during the 5 days of cold storage after the samples were pre-treated with 3-carene. These results show the anti-P. lundensis activity and mechanism of 3-carene, and its potential use in meat preservation under refrigerated conditions.

12.
Brief Bioinform ; 22(1): 474-484, 2021 01 18.
Article in English | MEDLINE | ID: mdl-31885044

ABSTRACT

BACKGROUND: With the increasing development of biotechnology and information technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these resources needs to be extracted and then transformed to useful knowledge by various data mining methods. However, a main computational challenge is how to effectively represent or encode molecular objects under investigation such as chemicals, proteins, DNAs and even complicated interactions when data mining methods are employed. To further explore these complicated data, an integrated toolkit to represent different types of molecular objects and support various data mining algorithms is urgently needed. RESULTS: We developed a freely available R/CRAN package, called BioMedR, for molecular representations of chemicals, proteins, DNAs and pairwise samples of their interactions. The current version of BioMedR could calculate 293 molecular descriptors and 13 kinds of molecular fingerprints for small molecules, 9920 protein descriptors based on protein sequences and six types of generalized scale-based descriptors for proteochemometric modeling, more than 6000 DNA descriptors from nucleotide sequences and six types of interaction descriptors using three different combining strategies. Moreover, this package realized five similarity calculation methods and four powerful clustering algorithms as well as several useful auxiliary tools, which aims at building an integrated analysis pipeline for data acquisition, data checking, descriptor calculation and data modeling. CONCLUSION: BioMedR provides a comprehensive and uniform R package to link up different representations of molecular objects with each other and will benefit cheminformatics/bioinformatics and other biomedical users. It is available at: https://CRAN.R-project.org/package=BioMedR and https://github.com/wind22zhu/BioMedR/.


Subject(s)
Computational Biology/methods , Database Management Systems , Data Management/methods , Databases, Chemical , Databases, Genetic , Humans
13.
Int J Biol Macromol ; 167: 539-546, 2021 Jan 15.
Article in English | MEDLINE | ID: mdl-33279566

ABSTRACT

This work investigated the effects of hot air drying pretreatment (HAD), freeze drying pretreatment (FD) and vacuum drying pretreatment (VD) on the physicochemical properties and structural characterizations of starch isolated from canistels. X-ray diffraction displayed that the starches separated from canistel by different drying pretreatments showed a typical A-type crystal structure. The SEM image showed that cracks and debris appeared on the surface of HVD and VD particles. The molecular structure of starches obtained by different drying pretreatments was studied using Fourier infrared and solid state 13C CP/MAS NMR analysis. The results indicated that vacuum drying pretreatment could promote the formation of the double helix of starch granules, and hot air drying and freeze drying destroyed the ordered structure of starch granules. These structural changed to affect the physicochemical properties of starch granules. The study of different drying pretreatments to separate starches provided practical value for drying pretreatments. Furthermore, the current study affords information for canistel starches cultivated in China that would be convenient for commercial applications.


Subject(s)
Sapotaceae/chemistry , Starch/chemistry , Carbohydrate Conformation , Desiccation , Molecular Structure , Spectroscopy, Fourier Transform Infrared , X-Ray Diffraction
14.
Metabolites ; 10(8)2020 Aug 10.
Article in English | MEDLINE | ID: mdl-32785071

ABSTRACT

Dendrobium officinale, a precious herbal medicine, has been used for a long time in Chinese history. The metabolites of D. officinale, regarded as its effective components to fight diseases, are significantly affected by cultivation substrates. In this study, ultra-performance liquid chromatography mass spectrometry (UPLC-MS/MS) was conducted to analyze D. officinale stems cultured in three different substrates: pine bark (PB), coconut coir (CC), and a pine bark: coconut coir 1:1 mix (PC). A total of 529 metabolites were identified. Multivariate statistical analysis methods were employed to analyze the difference in the content of metabolites extracted from different groups. By the criteria of variable importance in projection (VIP) value ≥1 and absolute log2 (fold change) ≥1, there were a total of 68, 51, and 57 metabolites, with significant differences in content across groups being filtrated out between PB and PC, PB and CC, and PC and CC, respectively. The comparisons among the three groups revealed that flavonoids were the metabolites that fluctuated most. The results suggested the D. officinale stems from the PB group possessed a higher flavonoid content. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis indicated that the significantly regulated metabolites were mainly connected with flavonoid biosynthesis. A comprehensive profile of the metabolic differentiation of D. officinale planted in different substrates was provided, which supports the selection of an optimum cultivation substrate for a higher biomass yield of D. officinale.

15.
Foods ; 9(3)2020 Mar 17.
Article in English | MEDLINE | ID: mdl-32192035

ABSTRACT

Characteristic aromas are usually key labels for food products. In this study, the volatile profiles and marker substances of coconut jam during concentration were characterized via sensory evaluation combined with headspace solid phase microextraction-gas chromatography-tandem mass spectrometry (HSPME/GC-MS). A total of 33 aroma compounds were detected by HSPME/GC-MS. Principal component analysis revealed the concentration process of coconut jam can be divided into three stages. In the first stage, esters and alcohols were the two main contributors to the aroma of the coconut jam. Next, a caramel smell was gradually formed during the second stage, which was mainly derived from aldehydes, ketones and alcohols. The concentration of aldehydes increased gradually at this stage, which may be the result of a combination of the Maillard reaction and the caramelization reaction. In the final sterilization stage, the 'odor intensity' of caramel reached the maximum level and a variety of aroma compounds were produced, thereby forming a unique flavor for the coconut jam. Finally, furfural fit a logistic model with a regression coefficient (r2) of 0.97034. Therefore, furfural can be used as a marker substance for monitoring the concentration of coconut jam.

16.
Molecules ; 25(3)2020 Feb 04.
Article in English | MEDLINE | ID: mdl-32033016

ABSTRACT

Vitamin E (VE) and ß-cyclodextrin (ß-CD) can form an inclusion complex; however, the inclusion rate is low because of the weak interaction between VE and ß-CD. The results of a molecular docking study showed that the oxygen atom in the five-membered ring of octenyl succinic anhydride (OSA) formed a strong hydrogen bond interaction (1.89 Å) with the hydrogen atom in the hydroxyl group of C-6. Therefore, ß-CD was modified using OSA to produce octenyl succinic-ß-cyclodextrin (OCD). The inclusion complexes were then prepared using OCD with VE. The properties of the inclusion complex were investigated by Fourier-transform infrared spectroscopy (FT-IR), 13C CP/MAS NMR, scanning electron microscopy (SEM), and atomic force microscopy (AFM). The results demonstrated that VE had been embedded into the cavity of OCD. Furthermore, the emulsifying properties (particle size distribution, ζ-potential, and creaming index) of the OCD/VE inclusion-complex-stabilized emulsion were compared with that stabilized by ß-CD, OCD, and an OCD/VE physical mixture. The results showed that the introduction of the OS group and VE could improve the physical stability of the emulsion. In addition, the OCD/VE inclusion complex showed the strongest ability to protect the oil in the emulsion from oxidation. OCD/VE inclusion complex was able to improve the physical and oxidative stability of the emulsion, which is of great significance to the food industry.


Subject(s)
Emulsions/chemistry , Food Preservation/methods , Succinates/chemistry , Vitamin E/chemistry , beta-Cyclodextrins/chemistry , Antioxidants/chemistry , Food Industry/methods , Lipid Peroxidation/drug effects , Microscopy, Atomic Force , Molecular Docking Simulation , Nuclear Magnetic Resonance, Biomolecular , Spectroscopy, Fourier Transform Infrared
17.
Spectrochim Acta A Mol Biomol Spectrosc ; 224: 117376, 2020 Jan 05.
Article in English | MEDLINE | ID: mdl-31325711

ABSTRACT

Variable (feature or wavelength) selection is a critical step in multivariate calibration of near-infrared (NIR) spectra. The high-resolution NIR or its imaging instruments usually generate hundreds or thousands of wavelengths, which make the variable selection methods tend to appear a high risk of overfitting, low efficiency, or requiring large computational abilities. Thus, it is a great challenge to efficiently select informative variables and obtain an optimal variable combination in a huge variable space. We propose a hybrid strategy for efficiently selecting variables based on three steps including rough selection, fine selection and optimal selection. The strong interpretability method like wavelength interval selection method (interval partial least squares, iPLS) was first used to roughly select informative intervals and shrink the variable space. Wavelength point selection methods such as variable importance in projection (VIP) and modified variable combination population analysis (mVCPA) were used to continuingly shrink the variable space from large to small in order to remain the very important variables. In the third step, applying some optimization methods such as iteratively retaining informative variables (IRIV) and genetic algorithm (GA) is to find an optimal variable combination from the remaining variables. It makes full use of the advantages of various involved methods and makes up for their disadvantages when facing high dimensional data. Two NIR datasets were employed to investigate the performance of the three-step hybrid strategy. It can significantly improve the prediction performance of the models built when compared with other single or hybrid methods (iPLS, VIP, iPLS-VIP, iPLS-VCPA, iPLS-mVCPA, VIP-GA, VIP-IRIV, mVCPA-GA, mVCPA-IRIV), indicating that the three-step hybrid strategy, including iPLS-VIP-IRIV, iPLS-VIP-GA, iPLS-mVCPA-GA and iPLS-mVCPA-IRIV, could efficiently select informative variables. Therefore, the three-step hybrid strategy is a good alternative for variable selection methods in the face of high dimensional NIR spectral data.

18.
J Chem Inf Model ; 60(1): 63-76, 2020 01 27.
Article in English | MEDLINE | ID: mdl-31869226

ABSTRACT

Lipophilicity, as evaluated by the n-octanol/buffer solution distribution coefficient at pH = 7.4 (log D7.4), is a major determinant of various absorption, distribution, metabolism, elimination, and toxicology (ADMET) parameters of drug candidates. In this study, we developed several quantitative structure-property relationship (QSPR) models to predict log D7.4 based on a large and structurally diverse data set. Eight popular machine learning algorithms were employed to build the prediction models with 43 molecular descriptors selected by a wrapper feature selection method. The results demonstrated that XGBoost yielded better prediction performance than any other single model (RT2 = 0.906 and RMSET = 0.395). Moreover, the consensus model from the top three models could continue to improve the prediction performance (RT2 = 0.922 and RMSET = 0.359). The robustness, reliability, and generalization ability of the models were strictly evaluated by the Y-randomization test and applicability domain analysis. Moreover, the group contribution model based on 110 atom types and the local models for different ionization states were also established and compared to the global models. The results demonstrated that the descriptor-based consensus model is superior to the group contribution method, and the local models have no advantage over the global models. Finally, matched molecular pair (MMP) analysis and descriptor importance analysis were performed to extract transformation rules and give some explanations related to log D7.4. In conclusion, we believe that the consensus model developed in this study can be used as a reliable and promising tool to evaluate log D7.4 in drug discovery.


Subject(s)
Machine Learning , Models, Molecular , Algorithms , Drug Discovery/methods , Lipids/chemistry , Quantitative Structure-Activity Relationship
19.
Anal Chim Acta ; 1058: 58-69, 2019 Jun 13.
Article in English | MEDLINE | ID: mdl-30851854

ABSTRACT

When analyzing high-dimensional near-infrared (NIR) spectral datasets, variable selection is critical to improving models' predictive abilities. However, some methods have many limitations, such as a high risk of overfitting, time-intensiveness, or large computation demands, when dealing with a high number of variables. In this study, we propose a hybrid variable selection strategy based on the continuous shrinkage of variable space which is the core idea of variable combination population analysis (VCPA). The VCPA-based hybrid strategy continuously shrinks the variable space from big to small and optimizes it based on modified VCPA in the first step. It then employs iteratively retaining informative variables (IRIV) and a genetic algorithm (GA) to carry out further optimization in the second step. It takes full advantage of VCPA, GA, and IRIV, and makes up for their drawbacks in the face of high numbers of variables. Three NIR datasets and three variable selection methods including two widely-used methods (competitive adaptive reweighted sampling, CARS and genetic algorithm-interval partial least squares, GA-iPLS) and one hybrid method (variable importance in projection coupled with genetic algorithm, VIP-GA) were used to investigate the improvement of VCPA-based hybrid strategy. The results show that VCPA-GA and VCPA-IRIV significantly improve model's prediction performance when compared with other methods, indicating that the modified VCPA step is a very efficient way to filter the uninformative variables and VCPA-based hybrid strategy is a good and promising strategy for variable selection in NIR. The MATLAB source codes of VCPA-GA and VCPA-IRIV can be freely downloaded in the website: https://cn.mathworks.com/matlabcentral/profile/authors/5526470-yonghuan-yun.

20.
Analyst ; 141(19): 5586-97, 2016 Oct 07.
Article in English | MEDLINE | ID: mdl-27435388

ABSTRACT

Variable selection and outlier detection are important processes in chemical modeling. Usually, they affect each other. Their performing orders also strongly affect the modeling results. Currently, many studies perform these processes separately and in different orders. In this study, we examined the interaction between outliers and variables and compared the modeling procedures performed with different orders of variable selection and outlier detection. Because the order of outlier detection and variable selection can affect the interpretation of the model, it is difficult to decide which order is preferable when the predictabilities (prediction error) of the different orders are relatively close. To address this problem, a simultaneous variable selection and outlier detection approach called Model Adaptive Space Shrinkage (MASS) was developed. This proposed approach is based on model population analysis (MPA). Through weighted binary matrix sampling (WBMS) from model space, a large number of partial least square (PLS) regression models were built, and the elite parts of the models were selected to statistically reassign the weight of each variable and sample. Then, the whole process was repeated until the weights of the variables and samples converged. Finally, MASS adaptively found a high performance model which consisted of the optimized variable subset and sample subset. The combination of these two subsets could be considered as the cleaned dataset used for chemical modeling. In the proposed approach, the problem of the order of variable selection and outlier detection is avoided. One near infrared spectroscopy (NIR) dataset and one quantitative structure-activity relationship (QSAR) dataset were used to test this approach. The result demonstrated that MASS is a useful method for data cleaning before building a predictive model.

SELECTION OF CITATIONS
SEARCH DETAIL
...