RESUMO
This study develops a novel double-loop contraction and C value sorting selection-based shrinkage frog-leaping algorithm (double-contractive cognitive random field [DC-CRF]) to mitigate the interference of complex salts and ions in seawater on the ultraviolet-visible (UV-Vis) absorbance spectra for chemical oxygen demand (COD) quantification. The key innovations of DC-CRF are introducing variable importance evaluation via C value to guide wavelength selection and accelerate convergence; a double-loop structure integrating random frog (RF) leaping and contraction attenuation to dynamically balance convergence speed and efficiency. Utilizing seawater samples from Jiaozhou Bay, DC-CRF-partial least squares regression (PLSR) reduced the input variables by 97.5% after 1,600 iterations relative to full-spectrum PLSR, RF-PLSR, and CRF-PLSR. It achieved a test R2 of 0.943 and root mean square error of 1.603, markedly improving prediction accuracy and efficiency. This work demonstrates the efficacy of DC-CRF-PLSR in enhancing UV-Vis spectroscopy for rapid COD analysis in intricate seawater matrices, providing an efficient solution for optimizing seawater spectra.
Assuntos
Algoritmos , Água do Mar , Análise da Demanda Biológica de Oxigênio , Análise Espectral , Análise dos Mínimos QuadradosRESUMO
Potato is the world's fourth-largest food crop, following rice, wheat, and maize. Unlike other crops, it is a typical root crop with a special growth cycle pattern and underground tubers, which makes it harder to track the progress of potatoes and to provide automated crop management. The classification of growth stages has great significance for right time management in the potato field. This paper aims to study how to classify the growth stage of potato crops accurately on the basis of spectroscopy technology. To develop a classification model that monitors the growth stage of potato crops, the field experiments were conducted at the tillering stage (S1), tuber formation stage (S2), tuber bulking stage (S3), and tuber maturation stage (S4), respectively. After spectral data pre-processing, the dynamic changes in chlorophyll content and spectral response during growth were analyzed. A classification model was then established using the support vector machine (SVM) algorithm based on spectral bands and the wavelet coefficients obtained from the continuous wavelet transform (CWT) of reflectance spectra. The spectral variables, which include sensitive spectral bands and feature wavelet coefficients, were optimized using three selection algorithms to improve the classification performance of the model. The selection algorithms include correlation analysis (CA), the successive projection algorithm (SPA), and the random frog (RF) algorithm. The model results were used to compare the performance of various methods. The CWT-SPA-SVM model exhibited excellent performance. The classification accuracies on the training set (Atrain) and the test set (Atest) were respectively 100% and 97.37%, demonstrating the good classification capability of the model. The difference between the Atrain and accuracy of cross-validation (Acv) was 1%, which showed that the model has good stability. Therefore, the CWT-SPA-SVM model can be used to classify the growth stages of potato crops accurately. This study provides an important support method for the classification of growth stages in the potato field.
RESUMO
Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random-frog recently proposed by our group, were employed to model quantitative structure-retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random-frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces.
Assuntos
Algoritmos , Óleos Voláteis/análise , Plantas/química , Análise dos Mínimos Quadrados , Modelos Lineares , Análise Multivariada , Análise de RegressãoRESUMO
The dimensionality of near-infrared (NIR) spectral data is often extremely large. Dimensionality reduction of spectral data can effectively reduce the redundant information and correlation between spectral variables and simplify the model, which is crucial to increasing the model's performance. As a nonlinear feature extraction method, Laplacian Eigenmaps (LE) may preserve the local neighborhood information of the dataset, has high robustness, and is simple to compute. However, when the LE algorithm maps the data from high-dimensional space to low-dimensional space, it is often disturbed by irrelevant information and multicollinearity in the spectral data, which lowers the model's prediction performance. Random Frog (RF) algorithm can eliminate noise and collinearity in the spectrum. Therefore, before using the LE algorithm, we use the RF algorithm to eliminate irrelevant information in the spectrum and reduce the correlation between the spectra variables to increase the efficiency of the LE algorithm. We used the RF + LE algorithm to reduce the dimensionality of two public NIRS datasets (soil datasets and pharmaceutical tablets datasets) and compared it with RF and LE algorithms alone. We utilized Partial Least Squares Regression (PLSR) and Support Vector Regression (SVR) to establish regression models. The experimental findings demonstrate that compared with the RF algorithm and LE algorithm, the RF + LE combination method can reduce the dimension of spectral variables and model complexity, and improve regression models' prediction accuracy and stability. It is an effective dimensionality reduction method for the near-infrared spectrum.
Assuntos
Algoritmos , Espectroscopia de Luz Próxima ao Infravermelho , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Análise dos Mínimos Quadrados , SoloRESUMO
Citrus is one of the most important fruits in China. Miyagawa Satsuma, one kind of citrus, is a nutritious agricultural product with regional characteristics of Chongming Island. Near-infrared Spectroscopy (NIR) is a proper method for studying the quality of fruits, because it is low-cost, efficient, non-destructive, and repeatable. Therefore, the NIR technique is used to detect citrus's soluble solid content (SSC) in this study. After obtaining the original spectral data, the first 70% of them are divided into the training set and 30% into the test set. Then, the Random Frog algorithm is chosen to select characteristic wavelengths, which reduces the dimension of the data and the complexity of the model, and accordingly makes the generalization of the classification model better. After comparing the performance of various classifiers (AdaBoost, KNN, LS-SVM, and Bayes) under different characteristic wavelength numbers, the AdaBoost classifier outperforms using 275 characteristic wavelengths for modeling eventually. The accuracy, precision, recall, and F 1-score are 78.3%, 80.5%, 78.3%, and 0.780, respectively and the ROC (Receiver Operating Characteristic Curve, ROC curve) is close to the upper left corner, suggesting that the classification model is acceptable. The results demonstrate that it is feasible to use the NIR technique to estimate whether the citrus is sweet or not. Furthermore, it is beneficial for us to apply the obtained models for identifying the quality of citrus correctly. For fruit traders, the model helps them to determine the growth cycle of citrus more scientifically, improve the level of citrus cultivation and management and the final fruit quality, and thus increase the economic income of fruit traders.
RESUMO
Pesticide residues in food have been a grave concern to consumers. Herein, we have developed a dual-mode SERS chip using Cu2O mesoporous spheres decorated with Ag nanoparticles (MCu2O@Ag NPs) as both sensing and degradation/clearing unit for rapid detection of pymetrozine and thiram pesticides in tea samples. Three kinds of chemometric algorithms were comparatively applied to analyze the collected SERS spectra of pesticides. In comparison, random frog-partial least squares achieved the best performance with root mean square error of prediction and residual predictive deviation values of 0.9871, 6.17, and 0.9873, 6.64 for pymetrozine and thiram, respectively. Additionally, the prepared SERS chip showed great photocatalytic activity to degrade pesticides under visible light irradiation. Through a facile method, this work presented a novel dual-functional SERS chip for the rapid detection and degradation of low-concentration pesticides in both environmental and food samples.
Assuntos
Contaminação de Alimentos/análise , Resíduos de Praguicidas/química , Análise Espectral Raman/métodos , Algoritmos , Luz , Limite de Detecção , Nanopartículas Metálicas/química , Prata/química , Análise Espectral Raman/instrumentaçãoRESUMO
The tuber development and nutrient transportation of potato crops are closely related to canopy photosynthesis dynamics. Chlorophyll fluorescence parameters of photosystem II, especially the maximum quantum yield of primary photochemistry (Fv/Fm), are intrinsic indicators for plant photosynthesis. Rapid detection of Fv/Fm of leaves by spectroscopy method instead of time-consuming pulse amplitude modulation technique could help to indicate potato development dynamics and guide field management. Accordingly, this study aims to extract fluorescence signals from hyperspectral reflectance to detect Fv/Fm. Hyperspectral imaging system and closed chlorophyll fluorescence imaging system were applied to collect the spectral data and values of Fv/Fm of 176 samples. The spectral data were decomposed by continuous wavelet transform (CWT) to obtain wavelet coefficients (WFs). Three mother wavelet functions including second derivative of Gaussian (gaus2), biorthogonal 3.3 (bior3.3) and reverse biorthogonal 3.3 (rbio3.3) were compared and the bior3.3 showed the best correlation with Fv/Fm. Two variable selection algorithms were used to select sensitive WFs of Fv/Fm including Monte Carlo uninformative variables elimination (MC-UVE) algorithm and random frog (RF) algorithm. Then the partial least squares (PLS) regression was used to establish detection models, which were labeled as bior3.3-MC-UVE-PLS and bior3.3-RF-PLS, respectively. The determination coefficients of prediction set of bior3.3-MC-UVE-PLS and bior3.3-RF-PLS were 0.8071 and 0.8218, respectively, and the root mean square errors of prediction set were 0.0181 and 0.0174, respectively. The bior3.3-RF-PLS had the best detection performance and the corresponding WFs were mainly distributed in the bands affected by fluorescence emission (650-800 nm), chlorophyll absorption and reflection. Overall, this study demonstrated the potential of CWT in fluorescence signals extraction and can serve as a guide in the quick detection of chlorophyll fluorescence parameters.
Assuntos
Solanum tuberosum , Análise de Ondaletas , Clorofila , Fluorescência , Análise dos Mínimos Quadrados , Folhas de PlantaRESUMO
Current methods for detecting aflatoxin contamination of agricultural and food commodities are generally based on wet chemical analyses, which are time-consuming, destructive to test samples, and require skilled personnel to perform, making them impossible for large-scale nondestructive screening and on-site detection. In this study, we utilized visible-near-infrared (Vis-NIR) spectroscopy over the spectral range of 400-2500 nm to detect contamination of commercial, shelled peanut kernels (runner type) with the predominant aflatoxin B1 (AFB1). The artificially contaminated samples were prepared by dropping known amounts of aflatoxin standard dissolved in 50:50 (v/v) methanol/water onto peanut kernel surface to achieve different contamination levels. The partial least squares discriminant analysis (PLS-DA) models established using the full spectra over different ranges achieved good prediction results. The best overall accuracy of 88.57% and 92.86% were obtained using the full spectra when taking 20 and 100 parts per billion (ppb), respectively, as the classification threshold. The random frog (RF) algorithm was used to find the optimal characteristic wavelengths for identifying the surface AFB1-contamination of peanut kernels. Using the optimal spectral variables determined by the RF algorithm, the simplified RF-PLS-DA classification models were established. The better RF-PLS-DA models attained the overall accuracies of 90.00% and 94.29% with the 20 ppb and 100 ppb thresholds, respectively, which were improved compared to using the full spectral variables. Compared to using the full spectral variables, the employed spectral variables of the simplified RF-PLS-DA models were decreased by at least 94.82%. The present study demonstrated that the Vis-NIR spectroscopic technique combined with appropriate chemometric methods could be useful in identifying AFB1 contamination of peanut kernels.
Assuntos
Aflatoxina B1/análise , Arachis/química , Aspergillus flavus/metabolismo , Contaminação de Alimentos , Arachis/microbiologia , Espectroscopia de Luz Próxima ao Infravermelho/métodosRESUMO
Potassium represents one of the most crucial minerals in infant formula that supports healthy growth and development of infants. Here, a novel strategy for the real-time quantification of potassium in infant formula samples is introduced. Using laser-induced breakdown spectroscopy (LIBS) in a data-driven approach, a modified random frog algorithm (MRFA) is adopted in a higher-density discrete wavelet transform (HDWT) domain for the selection of the most important features related to potassium, which is named as DD-LIBS. In DD-LIBS, the HDWT oversamples the LIBS signals in both time and frequency domains by a factor of two, enhancing the spectral expandability in an approximately shift-invariant way. The MRFA is thus capable of isolating the features of potassium with experience accumulated from the collected LIBS data. Such pretreatment combined with a partial least squared (PLS) model can significantly suppress the uncontrolled shift and broadening effects on multivariate calibration, improving the capability of LIBS for accurate quantification of potassium. The present work demonstrates the feasibility of DD-LIBS for the quantification of potassium content of 90 commercial infant formula samples. A satisfactory result illustrates DD-LIBS as a feasible tool for real-time analysis of potassium content with little sample preparation. This strategy may be well extended to other element detection in the presence of uncontrolled interference.
RESUMO
Near-infrared spectroscopy (NIRS) combined with chemometrics can achieve rapid detection in process analysis. After variable selection, the redundant information is effectively removed and the characteristic variables related to the response values are selected. Compared with global model, the complexity is significantly reduced and the prediction accuracy is also improved. In this study, near-infrared spectroscopy analysis combined with different variable selection methods was applied to achieve the rapid detection of baicalin in the extraction process of Scutellaria baicalensis. Data sets were divided based on sample set portioning based on joint x-y distance (SPXY) method. Competitive adaptive weighted resampling method (CARS), random frog (RF) and successive projections algorithm (SPA) were applied to variable selection. Partial least squares (PLS) models were constructed based on above three methods, and the prediction results were compared. After CARS, RF and SPA method, 92, 10 and 17 variables were screened out respectively. According to the performance of the models, CARS method is found to be more effective and suitable than RF and SPA. Furthermore, the characteristic variables selected by CARS method have a better correspondence with the chemical structure of baicalin. The root mean square error (RMSEC) of the calibration set and the root mean square error (RMSEP) of the prediction set are 0.528 2 and 0.720 2 respectively. Compared with the global PLS model, the correlation coefficient of the calibration set (Rc) is increased to 0.979 9 from 0.917 0, and the relative standard errors of prediction (RSEP) is reduced to 5.59% from 10.58%.