Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 149
Filtrar
1.
J Anal Methods Chem ; 2017: 9402045, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28168083

RESUMO

The rapid increase in the use of metabolite profiling/fingerprinting techniques to resolve complicated issues in metabolomics has stimulated demand for data processing techniques, such as alignment, to extract detailed information. In this study, a new and automated method was developed to correct the retention time shift of high-dimensional and high-throughput data sets. Information from the target chromatographic profiles was used to determine the standard profile as a reference for alignment. A novel, piecewise data partition strategy was applied for the determination of the target components in the standard profile as markers for alignment. An automated target search (ATS) method was proposed to find the exact retention times of the selected targets in other profiles for alignment. The linear interpolation technique (LIT) was employed to align the profiles prior to pattern recognition, comprehensive comparison analysis, and other data processing steps. In total, 94 metabolite profiles of ginseng were studied, including the most volatile secondary metabolites. The method used in this article could be an essential step in the extraction of information from high-throughput data acquired in the study of systems biology, metabolomics, and biomarker discovery.

2.
Guang Pu Xue Yu Guang Pu Fen Xi ; 37(1): 95-102, 2017 01.
Artigo em Chinês | MEDLINE | ID: mdl-30192487

RESUMO

Near infrared spectroscopy (NIRS) is a kind of indirect analysis technology, whose application depends on the setting up of relevant calibration model. In order to improve interpretability, accuracy and modeling efficiency of the prediction model, wavelength selection becomes very important and it can minimize redundant information of near infrared spectrum. Intelligent optimization algorithm is a sort of commonly wavelength selection method which establishes algorithm model by mathematical abstraction from the background of biological behavior or movement form of material, then iterative calculation to solve combinatorial optimization problems. Its core strategy is screening effective wavelength points in multivariate calibration modeling by using some objective functions as a standard with successive approximation method. In this work, five intelligent optimization algorithms, including ant colony optimization (ACO), genetic algorithm (GA), particle swarm optimization (PSO), random frog (RF) and simulated annealing (SA) algorithm, were used to select characteristic wavelength from NIR data of tobacco leaf for determination of total nitrogen and nicotine content and together with partial least squares (PLS) to construct multiple correction models. The comparative analysis results of these models showed that, the total nitrogen optimums models of dataset A and B were PSO-PLS and GA-PLS models. GA-PLS and SA-PLS models were the optimums for nicotine, respectively. Although not all predicting performance of these optimization models was superior to that of full spectrum PLS models, they were simplified greatly and their forecasting accuracy, precision, interpretability and stability were improved. Therefore, this research will have great significance and plays an important role for the practical application. Meanwhile, it could be concluded that the informative wavelength combination for total nitrogen were 4 587~4 878 and 6 700~7 200 cm(-1), and that for tobacco nicotine were 4 500~4 700 and 5 800~6 000 cm(-1). These selected wavelengths have actually physical significance.

3.
Analyst ; 141(19): 5586-97, 2016 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-27435388

RESUMO

Variable selection and outlier detection are important processes in chemical modeling. Usually, they affect each other. Their performing orders also strongly affect the modeling results. Currently, many studies perform these processes separately and in different orders. In this study, we examined the interaction between outliers and variables and compared the modeling procedures performed with different orders of variable selection and outlier detection. Because the order of outlier detection and variable selection can affect the interpretation of the model, it is difficult to decide which order is preferable when the predictabilities (prediction error) of the different orders are relatively close. To address this problem, a simultaneous variable selection and outlier detection approach called Model Adaptive Space Shrinkage (MASS) was developed. This proposed approach is based on model population analysis (MPA). Through weighted binary matrix sampling (WBMS) from model space, a large number of partial least square (PLS) regression models were built, and the elite parts of the models were selected to statistically reassign the weight of each variable and sample. Then, the whole process was repeated until the weights of the variables and samples converged. Finally, MASS adaptively found a high performance model which consisted of the optimized variable subset and sample subset. The combination of these two subsets could be considered as the cleaned dataset used for chemical modeling. In the proposed approach, the problem of the order of variable selection and outlier detection is avoided. One near infrared spectroscopy (NIR) dataset and one quantitative structure-activity relationship (QSAR) dataset were used to test this approach. The result demonstrated that MASS is a useful method for data cleaning before building a predictive model.

4.
Int J Biol Macromol ; 87: 290-4, 2016 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26927937

RESUMO

A method using partial least squares (PLS) for simultaneous determination of neutral and uronic sugars was developed in this paper. This method is based on the development of the reaction between the analytes and anthrone. The calibration set was built with 25 binary solutions at the concentrations ranging from 20 to 100µg/mL for glucose and from 10 to 50µg/mL for glucuronic acid. An independent prediction set was utilized to check the robustness of the PLS calibration model. The root-mean-square error of prediction (RMSEP) values for neutral and uronic sugars are 1.2233 and 1.9367, respectively. The correlation coefficient for the prediction set (Rp(2)) values for them are 0.9971 and 0.9767, respectively. Compared with the univariate method, the proposed method improves detection accuracy. In addition, it was also applied to commercial polysaccharides and Glycyrrhiza uralensis polysaccharides (GUPs), and the results indicated that the PLS model was suitable for simultaneous determination of neutral and uronic sugars.


Assuntos
Carboidratos/análise , Carboidratos/química , Espectrofotometria Ultravioleta/métodos , Ácidos Urônicos/química , Calibragem , Glycyrrhiza/química , Análise dos Mínimos Quadrados , Análise de Componente Principal , Reprodutibilidade dos Testes , Fatores de Tempo
5.
Analyst ; 141(6): 1973-80, 2016 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-26846329

RESUMO

In order to solve the spectra standardization problem in near-infrared (NIR) spectroscopy, a Transfer via Extreme learning machine Auto-encoder Method (TEAM) has been proposed in this study. A comparative study among TEAM, piecewise direct standardization (PDS), generalized least squares (GLS) and calibration transfer methods based on canonical correlation analysis (CCA) was conducted, and the performances of these algorithms were benchmarked with three spectral datasets: corn, tobacco and pharmaceutical tablet spectra. The results show that TEAM is a stable method and can significantly reduce prediction errors compared with PDS, GLS and CCA. TEAM can also achieve the best RMSEPs in most cases with a small number of calibration sets. TEAM is implemented in Python language and available as an open source package at https://github.com/zmzhang/TEAM.

6.
J Chromatogr B Analyt Technol Biomed Life Sci ; 1015-1016: 82-91, 2016 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-26901849

RESUMO

Traditional Chinese medicines (TCMs) bring a great challenge in quality control and evaluating the efficacy because of their complexity of chemical composition. Chemometric techniques provide a good opportunity for mining more useful chemical information from TCMs. Then, the application of chemometrics in the field of TCMs is spontaneous and necessary. This review focuses on the recent various important chemometrics tools for chromatographic fingerprinting, including peak alignment information features, baseline correction and applications of chemometrics in metabolomics and modernization of TCMs, including authentication and evaluation of the quality of TCMs, evaluating the efficacy of TCMs and essence of TCM syndrome. In the conclusions, the general trends and some recommendations for improving chromatographic metabolomics data analysis are provided.


Assuntos
Cromatografia , Medicamentos de Ervas Chinesas/análise , Metabolômica , Cromatografia/métodos , Cromatografia/normas , Metabolômica/métodos , Metabolômica/normas
7.
Anal Chim Acta ; 911: 27-34, 2016 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-26893083

RESUMO

Biomarker discovery is one important goal in metabolomics, which is typically modeled as selecting the most discriminating metabolites for classification and often referred to as variable importance analysis or variable selection. Until now, a number of variable importance analysis methods to discover biomarkers in the metabolomics studies have been proposed. However, different methods are mostly likely to generate different variable ranking results due to their different principles. Each method generates a variable ranking list just as an expert presents an opinion. The problem of inconsistency between different variable ranking methods is often ignored. To address this problem, a simple and ideal solution is that every ranking should be taken into account. In this study, a strategy, called rank aggregation, was employed. It is an indispensable tool for merging individual ranking lists into a single "super"-list reflective of the overall preference or importance within the population. This "super"-list is regarded as the final ranking for biomarker discovery. Finally, it was used for biomarkers discovery and selecting the best variable subset with the highest predictive classification accuracy. Nine methods were used, including three univariate filtering and six multivariate methods. When applied to two metabolic datasets (Childhood overweight dataset and Tubulointerstitial lesions dataset), the results show that the performance of rank aggregation has improved greatly with higher prediction accuracy compared with using all variables. Moreover, it is also better than penalized method, least absolute shrinkage and selectionator operator (LASSO), with higher prediction accuracy or less number of selected variables which are more interpretable.


Assuntos
Biomarcadores/metabolismo , Metabolômica , Estudos de Casos e Controles , Criança , Cromatografia Gasosa-Espectrometria de Massas , Humanos , Modelos Teóricos , Sobrepeso/sangue
8.
Anal Chim Acta ; 908: 63-74, 2016 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-26826688

RESUMO

In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss.


Assuntos
Modelos Químicos , Algoritmos , Análise dos Mínimos Quadrados , Método de Monte Carlo , Espectroscopia de Luz Próxima ao Infravermelho
9.
Analyst ; 140(23): 7955-64, 2015 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-26514234

RESUMO

Accurate peak detection is essential for analyzing high-throughput datasets generated by analytical instruments. Derivatives with noise reduction and matched filtration are frequently used, but they are sensitive to baseline variations, random noise and deviations in the peak shape. A continuous wavelet transform (CWT)-based method is more practical and popular in this situation, which can increase the accuracy and reliability by identifying peaks across scales in wavelet space and implicitly removing noise as well as the baseline. However, its computational load is relatively high and the estimated features of peaks may not be accurate in the case of peaks that are overlapping, dense or weak. In this study, we present multi-scale peak detection (MSPD) by taking full advantage of additional information in wavelet space including ridges, valleys, and zero-crossings. It can achieve a high accuracy by thresholding each detected peak with the maximum of its ridge. It has been comprehensively evaluated with MALDI-TOF spectra in proteomics, the CAMDA 2006 SELDI dataset as well as the Romanian database of Raman spectra, which is particularly suitable for detecting peaks in high-throughput analytical signals. Receiver operating characteristic (ROC) curves show that MSPD can detect more true peaks while keeping the false discovery rate lower than MassSpecWavelet and MALDIquant methods. Superior results in Raman spectra suggest that MSPD seems to be a more universal method for peak detection. MSPD has been designed and implemented efficiently in Python and Cython. It is available as an open source package at .

10.
Int J Biol Macromol ; 79: 681-6, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26051342

RESUMO

Glycyrrhiza uralensis, an important Chinese medicine, has a long history of use in China. In this study, three water-soluble polysaccharides fractions (GUPs-1, GUPs-2 and GUPs-3) were isolated and purified from the root of G. uralensis by DEAE-52 and Sephadex G-100 column chromatography. Physicochemical properties and antioxidant activities of the three purified polysaccharides were investigated. The molecular weights of GUPs-1, GUPs-2 and GUPs-3 were 10,160, 11,680 and 13,360 Da, and the ratios of glucose were 23.4%, 14% and 1.13%, respectively. The antioxidant activities of the three purified polysaccharides followed the order: GUPs-1>GUPs-2>GUPs-3. GUPs with lower molecular weight and higher ratio of glucose, basically exhibited higher antioxidant activities at the same concentration. This indicated that the molecular weight and the ratio of monosaccharide composition of the GUPs could affect the antioxidant activities.


Assuntos
Antioxidantes/química , Glycyrrhiza uralensis/química , Quelantes de Ferro/química , Raízes de Plantas/química , Polissacarídeos/química , Antioxidantes/isolamento & purificação , Compostos de Bifenilo/antagonistas & inibidores , Cromatografia em Gel , Cromatografia por Troca Iônica , Glucose/análise , Radical Hidroxila/antagonistas & inibidores , Quelantes de Ferro/isolamento & purificação , Peso Molecular , Oxirredução , Picratos/antagonistas & inibidores , Polissacarídeos/isolamento & purificação
11.
Anal Chim Acta ; 880: 32-41, 2015 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-26092335

RESUMO

Partial least squares (PLS) is one of the most widely used methods for chemical modeling. However, like many other parameter tunable methods, it has strong tendency of over-fitting. Thus, a crucial step in PLS model building is to select the optimal number of latent variables (nLVs). Cross-validation (CV) is the most popular method for PLS model selection because it selects a model from the perspective of prediction ability. However, a clear minimum of prediction errors may not be obtained in CV which makes the model selection difficult. To solve the problem, we proposed a new strategy for PLS model selection which combines the cross-validated coefficient of determination (Qcv(2)) and model stability (S). S is defined as the stability of PLS regression vectors which is obtained using model population analysis (MPA). The results show that, when a clear maximum of Qcv(2) is not obtained, S can provide additional information of over-fitting and it helps in finding the optimal nLVs. Compared with other regression vector based indictors such as the Euclidean 2-norm (B2), the Durbin Watson statistic (DW) and the jaggedness (J), S is more sensitive to over-fitting. The model selected by our method has both good prediction ability and stability.


Assuntos
Algoritmos , Modelos Químicos , Análise dos Mínimos Quadrados , Software , Glycine max/química , Glycine max/metabolismo , Espectrofotometria Ultravioleta
12.
Int J Biol Macromol ; 79: 983-7, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26093314

RESUMO

A method for quantitative analysis of the polysaccharides contents in Glycyrrhiza was developed based on near infrared (NIR) spectroscopy, and by adopting the phenol-sulphuric acid method as the reference method. This is the first time to use this method for predicting polysaccharides contents in Glycyrrhiza. To improve the predictive ability (or robustness) of the model, the competitive adaptive reweighted sampling (CARS) mathematical strategy was used for selecting relevance wavelengths. By using the restricted relevance wavelengths, the PLS model was more efficient and parsimonious. The coefficient of determination of prediction (Rp(2)) and the root mean square error of prediction (RMSEP) of the obtained optimum models were 0.9119 and 0.4350 for polysaccharides. The selected relevance wavelengths were also interpreted. It proved that all the wavelengths selected by CARS were related to functional groups of polysaccharide. The overall results show that NIR spectroscopy combined with chemometrics can be efficiently utilised for analysis of polysaccharides contents in Glycyrrhiza.


Assuntos
Antioxidantes/química , Glycyrrhiza/química , Polissacarídeos/isolamento & purificação , Antioxidantes/isolamento & purificação , Frutas/química , Polissacarídeos/química , Polissacarídeos/classificação , Espectroscopia de Luz Próxima ao Infravermelho
13.
Biochem Biophys Res Commun ; 461(1): 186-92, 2015 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-25881503

RESUMO

Renal interstitial fibrosis closely relates to chronic kidney disease and is regarded as the final common pathway in most cases of end-stage renal disease. Metabolomic biomarkers can facilitate early diagnosis and allow better understanding of the pathogenesis underlying renal fibrosis. Gas chromatography-mass spectrometry (GC/MS) is one of the most promising techniques for identification of metabolites. However, the existence of the background, baseline offset, and overlapping peaks makes accurate identification of the metabolites unachievable. In this study, GC/MS coupled with chemometric methods was successfully developed to accurately identify and seek metabolic biomarkers for rats with renal fibrosis. By using these methods, seventy-six metabolites from rat serum were accurately identified and five metabolites (i.e., urea, ornithine, citric acid, galactose, and cholesterol) may be useful as potential biomarkers for renal fibrosis.


Assuntos
Algoritmos , Biomarcadores/sangue , Análise Química do Sangue/métodos , Interpretação Estatística de Dados , Cromatografia Gasosa-Espectrometria de Massas/métodos , Rim/metabolismo , Insuficiência Renal Crônica/sangue , Animais , Fibrose/sangue , Masculino , Análise Multivariada , Ratos , Ratos Wistar , Insuficiência Renal Crônica/diagnóstico , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
14.
J Chromatogr A ; 1393: 47-56, 2015 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-25818557

RESUMO

Solvent system selection is the first step toward a successful counter-current chromatography (CCC) separation. This paper introduces a systematic and practical solvent system selection strategy based on the nonrandom two-liquid segment activity coefficient (NRTL-SAC) model, which is efficient in predicting the solute partition coefficient. Firstly, the application of the NRTL-SAC method was extended to the ethyl acetate/n-butanol/water and chloroform/methanol/water solvent system families. Moreover, the versatility and predictive capability of the NRTL-SAC method were investigated. The results indicate that the solute molecular parameters identified from hexane/ethyl acetate/methanol/water solvent system family are capable of predicting a large number of partition coefficients in several other different solvent system families. The NRTL-SAC strategy was further validated by successfully separating five components from Salvia plebeian R.Br. We therefore propose that NRTL-SAC is a promising high throughput method for rapid solvent system selection and highly adaptable to screen suitable solvent system for real-life CCC separation.


Assuntos
Cromatografia Líquida de Alta Pressão/métodos , Distribuição Contracorrente/métodos , Solventes/química , 1-Butanol/química , Acetatos/química , Clorofórmio/química , Hexanos/química , Metanol/química , Extratos Vegetais/química , Salvia/química , Água/química
15.
Analyst ; 140(6): 1876-85, 2015 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-25665981

RESUMO

In this study, a new algorithm for wavelength interval selection, known as interval variable iterative space shrinkage approach (iVISSA), is proposed based on the VISSA algorithm. It combines global and local searches to iteratively and intelligently optimize the locations, widths and combinations of the spectral intervals. In the global search procedure, it inherits the merit of soft shrinkage from VISSA to search the locations and combinations of informative wavelengths, whereas in the local search procedure, it utilizes the information of continuity in spectroscopic data to determine the widths of wavelength intervals. The global and local search procedures are carried out alternatively to realize wavelength interval selection. This method was tested using three near infrared (NIR) datasets. Some high-performing wavelength selection methods, such as synergy interval partial least squares (siPLS), moving window partial least squares (MW-PLS), competitive adaptive reweighted sampling (CARS), genetic algorithm PLS (GA-PLS) and interval random frog (iRF), were used for comparison. The results show that the proposed method is very promising with good results both on prediction capability and stability. The MATLAB codes for implementing iVISSA are freely available on the website: .


Assuntos
Algoritmos , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Farinha/análise , Análise dos Mínimos Quadrados , Glycine max/química , Comprimidos/química , Zea mays/química
16.
Anal Chim Acta ; 862: 14-23, 2015 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-25682424

RESUMO

Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of 'survival of the fittest' from Darwin's natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm-partial least squares (GA-PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.


Assuntos
Modelos Estatísticos , Algoritmos , Calibragem , Análise dos Mínimos Quadrados , Método de Monte Carlo , Análise Multivariada
17.
J Chromatogr A ; 1370: 179-86, 2014 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-25454143

RESUMO

In this work, a novel strategy based on chromatographic fingerprints and some chemometric techniques is proposed for quantitative analysis of the formulated complex system. Here, the formulated complex system means a formulated type of complicated analytical system containing more than one kind of raw material under some concentration composition according to a certain formula. The strategy is elaborated by an example of quantitative determination of mixtures consist of three essential oils. Three key steps of the strategy are as follows: (1) remove baselines of the chromatograms; (2) align retention time; (3) conduct quantitative analysis using multivariate regression with entire chromatographic profiles. Through the determination of concentration compositions of nine mixtures arranged by uniform design, the feasibility of the proposed strategy is validated and the factors that influence the quantitative result are also discussed. This strategy is proved to be viable and the validation indicates that quantitative result obtained using this strategy mainly depends on the efficiency of the alignment method as well as chromatographic peak shape of the chromatograms. Previously, chromatographic fingerprints were only used for identification and/or recognition of some products. This work demonstrates that with the assistance of some effective chemometric techniques, chromatographic fingerprints are also potential and promising in solving quantitative problems of complex analytical systems.


Assuntos
Cromatografia Gasosa-Espectrometria de Massas/métodos , Estudos de Viabilidade , Óleos Voláteis/análise
18.
Analyst ; 139(19): 4836-45, 2014 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-25083512

RESUMO

In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.


Assuntos
Algoritmos , Gasolina/análise , Modelos Teóricos , Método de Monte Carlo , Software , Óleo de Soja/química , Triticum/química , Triticum/metabolismo
19.
Anal Chem ; 86(15): 7446-54, 2014 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-25032905

RESUMO

Accurate prediction of peptide fragment ion mass spectra is one of the critical factors to guarantee confident peptide identification by protein sequence database search in bottom-up proteomics. In an attempt to accurately and comprehensively predict this type of mass spectra, a framework named MS(2)PBPI is proposed. MS(2)PBPI first extracts fragment ions from large-scale MS/MS spectra data sets according to the peptide fragmentation pathways and uses binary trees to divide the obtained bulky data into tens to more than 1000 regions. For each adequate region, stochastic gradient boosting tree regression model is constructed. By constructing hundreds of these models, MS(2)PBPI is able to predict MS/MS spectra for unmodified and modified peptides with reasonable accuracy. Moreover, high consistency between predicted and experimental MS/MS spectra derived from different ion trap instruments with low and high resolving power is achieved. MS(2)PBPI outperforms existing algorithms MassAnalyzer and PeptideART.


Assuntos
Mineração de Dados/métodos , Fragmentos de Peptídeos/química , Espectrometria de Massas em Tandem/métodos
20.
J Chromatogr A ; 1355: 80-5, 2014 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-24951288

RESUMO

Selection of an appropriate solvent system is of great importance for a successful counter-current chromatography separation. In this work, the nonrandom two-liquid (NRTL) model, a thermodynamic method, was used for predicting the partition coefficient based on a few measured partition coefficients. The NRTL method provides quite satisfactory results for model solutes in first correlating measured partition coefficient in a few representative biphasic liquid systems and then successfully predicting partition coefficient in other two-phase liquid systems. According to the predicted partition coefficient, a suitable solvent system can be screened. Assisted with the NRTL method, the solvent system composed of hexane/ethyl acetate/methanol/water (1:4:1:4, v/v) was rapidly screened for the successful separation of two major compounds with high purity from Malus hupehensis leaves. The results demonstrated that the NRTL model can offer a simple and practical strategy to estimate partition coefficients in support of CCC solvent system selection, which will significantly minimize the experimental efforts and cost involved in solvent system selection.


Assuntos
Cromatografia Líquida de Alta Pressão/métodos , Distribuição Contracorrente/métodos , Acetatos/química , Cromatografia Líquida de Alta Pressão/instrumentação , Distribuição Contracorrente/instrumentação , Hexanos/química , Metanol/química , Solventes/química , Termodinâmica , Água/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA