Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Anal Chim Acta ; 908: 63-74, 2016 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-26826688

RESUMO

In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss.


Assuntos
Modelos Químicos , Algoritmos , Análise dos Mínimos Quadrados , Método de Monte Carlo , Espectroscopia de Luz Próxima ao Infravermelho
2.
Anal Chim Acta ; 862: 14-23, 2015 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-25682424

RESUMO

Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of 'survival of the fittest' from Darwin's natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm-partial least squares (GA-PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.


Assuntos
Modelos Estatísticos , Algoritmos , Calibragem , Análise dos Mínimos Quadrados , Método de Monte Carlo , Análise Multivariada
3.
Analyst ; 139(19): 4836-45, 2014 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-25083512

RESUMO

In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.


Assuntos
Algoritmos , Gasolina/análise , Modelos Teóricos , Método de Monte Carlo , Software , Óleo de Soja/química , Triticum/química , Triticum/metabolismo
4.
Anal Chim Acta ; 807: 36-43, 2014 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-24356218

RESUMO

Nowadays, with a high dimensionality of dataset, it faces a great challenge in the creation of effective methods which can select an optimal variables subset. In this study, a strategy that considers the possible interaction effect among variables through random combinations was proposed, called iteratively retaining informative variables (IRIV). Moreover, the variables are classified into four categories as strongly informative, weakly informative, uninformative and interfering variables. On this basis, IRIV retains both the strongly and weakly informative variables in every iterative round until no uninformative and interfering variables exist. Three datasets were employed to investigate the performance of IRIV coupled with partial least squares (PLS). The results show that IRIV is a good alternative for variable selection strategy when compared with three outstanding and frequently used variable selection methods such as genetic algorithm-PLS, Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS) and competitive adaptive reweighted sampling (CARS). The MATLAB source code of IRIV can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.


Assuntos
Algoritmos , Modelos Teóricos , Calibragem , Internet , Análise dos Mínimos Quadrados , Método de Monte Carlo , Software , Óleo de Soja/química , Espectroscopia de Luz Próxima ao Infravermelho/normas , Água/análise , Água/normas , Zea mays/química
5.
Anal Chim Acta ; 740: 20-6, 2012 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-22840646

RESUMO

The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.).


Assuntos
Neoplasias da Mama/genética , Neoplasias do Colo/genética , Doença/classificação , Doença/genética , Cadeias de Markov , Método de Monte Carlo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Colo/metabolismo , Estrogênios/genética , Perfilação da Expressão Gênica , Humanos
6.
J Chromatogr A ; 1223: 93-106, 2012 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-22222564

RESUMO

Chromatography has been extensively applied in many fields, such as metabolomics and quality control of herbal medicines. Preprocessing, especially peak alignment, is a time-consuming task prior to the extraction of useful information from the datasets by chemometrics and statistics. To accurately and rapidly align shift peaks among one-dimensional chromatograms, multiscale peak alignment (MSPA) is presented in this research. Peaks of each chromatogram were detected based on continuous wavelet transform (CWT) and aligned against a reference chromatogram from large to small scale gradually, and the aligning procedure is accelerated by fast Fourier transform cross correlation. The presented method was compared with two widely used alignment methods on chromatographic dataset, which demonstrates that MSPA can preserve the shapes of peaks and has an excellent speed during alignment. Furthermore, MSPA method is robust and not sensitive to noise and baseline. MSPA was implemented and is available at http://code.google.com/p/mspa.


Assuntos
Algoritmos , Cromatografia/métodos , Reconhecimento Automatizado de Padrão/métodos , Cromatografia/economia , Cromatografia Líquida de Alta Pressão/economia , Cromatografia Líquida de Alta Pressão/métodos , Ácidos Graxos/sangue , Humanos , Reconhecimento Automatizado de Padrão/economia , Plantas Medicinais/química , Fatores de Tempo
7.
Anal Chim Acta ; 706(1): 97-104, 2011 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-21995915

RESUMO

Large amounts of data from high-throughput metabolomics experiments become commonly more and more complex, which brings an enormous amount of challenges to existing statistical modeling. Thus there is a need to develop statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In the work, we developed a novel kernel Fisher discriminant analysis (KFDA) algorithm by constructing an informative kernel based on decision tree ensemble. The constructed kernel can effectively encode the similarities of metabolomics samples between informative metabolites/biomarkers in specific parts of the measurement space. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by variable importance ranking in the process of building kernel. Moreover, KFDA can also deal with nonlinear relationship in the metabolomics data by such a kernel to some extent. Finally, two real metabolomics datasets together with a simulated data were used to demonstrate the performance of the proposed approach through the comparison of different approaches.


Assuntos
Análise Discriminante , Metabolômica , Algoritmos , Biomarcadores/metabolismo , Glicemia/análise , Árvores de Decisões , Diabetes Mellitus/metabolismo , Humanos
8.
Analyst ; 136(5): 947-54, 2011 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-21157593

RESUMO

Large amounts of data from high-throughput metabolomics experiments have become commonly more and more complex, which brings a number of challenges to existing statistical modeling. Thus there is a need to develop a statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In this work, we provide a new strategy based on Monte Carlo cross validation coupled with the classification tree algorithm, which is termed as the MCTree approach. The MCTree approach inherently provides a feasible way to uncover the predictive structure of metabolomics data by the establishment of many cross-predictive models. With the help of the sample proximity matrix such obtained, it seems to be able to give some interesting insights into metabolomics data. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by means of variable importance ranking in the MCTree approach. Two real metabolomics datasets are finally used to demonstrate the performance of the proposed approach.


Assuntos
Ensaios de Triagem em Larga Escala/métodos , Metabolômica/métodos , Método de Monte Carlo , Algoritmos , Modelos Biológicos , Modelos Estatísticos , Análise de Componente Principal
9.
J Comput Chem ; 31(3): 592-602, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19530115

RESUMO

The crucial step of building a high performance QSAR/QSPR model is the detection of outliers in the model. Detecting outliers in a multivariate point cloud is not trivial, especially when several outliers coexist in the model. The classical identification methods do not always identify them, because they are based on the sample mean and covariance matrix influenced by the outliers. Moreover, existing methods only lay stress on some type of outliers but not all the outliers. To avoid these problems and detect all kinds of outliers simultaneously, we provide a new strategy based on Monte-Carlo cross-validation, which was termed as the MC method. The MC method inherently provides a feasible way to detect different kinds of outliers by establishment of many cross-predictive models. With the help of the distribution of predictive residuals such obtained, it seems to be able to reduce the risk caused by the masking effect. In addition, a new display is proposed, in which the absolute values of mean value of predictive residuals are plotted versus standard deviations of predictive residuals. The plot divides the data into normal samples, y direction outliers and X direction outliers. Several examples are used to demonstrate the detection ability of MC method through the comparison of different diagnostic methods.


Assuntos
Método de Monte Carlo , Relação Quantitativa Estrutura-Atividade , Calibragem , Simulação por Computador , Modelos Estatísticos , Valor Preditivo dos Testes , Análise de Regressão
10.
J Pharm Biomed Anal ; 43(4): 1507-13, 2007 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-17118604

RESUMO

A fast, accurate, sensitive, selective and reliable method using reversed-phase high-performance liquid chromatography-mass spectrometry coupling with an electrospray ionization interface was developed and validated for the determination of finasteride in human plasma. After deprotienation with acetonitrile, centrifugation, evaporation to dryness and dissolving in mobile phase, satisfactory separation was achieved on a Hypersil-Keystone C(18) reversed-phase column using a mobile phase consisting of acetonitrile-water (46:54, v/v), 0.1% acetic acid and 0.1% trifluoracetic acid. Carbamazepine (IS) was used as internal standard. This method involved the use of the [M+H](+) ions of finasteride and IS at m/z 373 and 237 with the selective ion monitoring (SIM) mode. The calibration curve was linear in the range of 0.2-120 ng ml(-1). The limit of quantification for finasteride in plasma was 0.2 ng ml(-1) with good accuracy and precision. The intra-assay precision and accuracy were in the range of 2.1-11.2% and -1.3% to 8.5%, respectively. The inter-assay precision and accuracy were in the order of 3.4-12.1% and -1.5% to 11.5%, respectively. The mean sample extract recoveries of the method were higher than 85% and 74% for finasteride and internal standard (IS), respectively. The assay has been successfully used to estimate the pharmacokinetics of finasteride after oral administration of a 5mg tablet of finasteride to 24 healthy volunteers.


Assuntos
Cromatografia Líquida/métodos , Inibidores Enzimáticos/sangue , Inibidores Enzimáticos/farmacocinética , Finasterida/sangue , Finasterida/farmacocinética , Espectrometria de Massas por Ionização por Electrospray/métodos , Administração Oral , Adolescente , Adulto , Carbamazepina/química , Cromatografia Líquida/instrumentação , Inibidores Enzimáticos/administração & dosagem , Inibidores Enzimáticos/química , Finasterida/administração & dosagem , Finasterida/química , Humanos , Masculino , Estrutura Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Comprimidos , Fatores de Tempo
11.
J Chromatogr A ; 1112(1-2): 171-80, 2006 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-16472540

RESUMO

Traditional Chinese Herbal Medicine (TCHM) contain multiple botanicals, each of which contains many compounds that may be relevant to the medicine's putative activity. Therefore, analytical techniques that look at a suite of compounds, including their respective ratios, provide a more rational approach to the authentication and quality assessment of TCHM. In this paper we present several examples of applying chromatographic fingerprint analysis for determining the identity, stability, and consistency of TCHM as well as the identification of adulterants as follows: (1) species authentication of various species of ginseng (Panax ginseng, Panax quinquefolium, Panax noto-ginseng) and stability of ginseng preparations using high performance thin-layer chromatography (HPTLC) fingerprint analysis; (2) batch-to-batch consistency of extracts of Total Glycosides of Peony (TGP), to be used as a raw material and in finished products (TGP powdered extract products), using high performance liquid chromatography (HPLC) fingerprint analysis with a pattern recognition software interface (CASE); (3) documenting the representative HPLC fingerprints of Immature Fruits of Terminalia chebula (IFTC) through the assessment of raw material, in-process assay of the extracts, and the analysis of the finished product (tablets); (4) HPLC fingerprint study demonstrating the consistent quality of total flavonoids of commercial extracts of ginkgo (Ginkgo biloba) leaves (EGb) along with detection of adulterations. The experimental conditions as well as general comments on the application of chromatographic fingerprint analysis are discussed.


Assuntos
Cromatografia Líquida de Alta Pressão/métodos , Cromatografia em Camada Fina/métodos , Medicamentos de Ervas Chinesas/química , Garantia da Qualidade dos Cuidados de Saúde/métodos , Estabilidade de Medicamentos , Medicamentos de Ervas Chinesas/normas , Humanos , Reconhecimento Automatizado de Padrão , Extratos Vegetais/química , Extratos Vegetais/normas
12.
Anal Bioanal Chem ; 380(4): 643-9, 2004 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-15448963

RESUMO

A simple, fast, sensitive, and reproducible isocratic liquid chromatography-mass spectrometry (LC-MS) method coupled with an atmospheric pressure chemical ionization (APCI) interface for simultaneous separation and determination of L-arginine (ARG) and its methylated metabolites, N-monomethyl- L-arginine (MMA), NG, NG-dimethylarginine (asymmetric dimethyl arginine, ADMA), and NG, N'G-dimethylarginine (symmetric dimethyl arginine, SDMA), in human plasma is presented. Sample pretreatment is not required other than deproteinization with 5-sulfosalicylic acid (5-SSA). Satisfactory chromatographic separation was achieved on a 2.0x150-mm Shimadzu VP-ODS column by using a mobile phase consisting of water/acetonitrile (90/10, v/v) containing 0.5% trifluoroacetic acid (TFA). Positive selective ion monitoring (SIM) mode was chosen for quantification of each analyte. The positively protonated molecular ions [M+H]+ of ARG, MMA, ADMA, and SDMA were monitored at m/z 175, 189, 203, and 203, respectively. L-Homoarginine was used as the internal standard (IS) for the assay. The limits of quantification (LOQs) were found to be 1.0 micromol L(-1) for ARG, and 0.2 micromol L(-1) for MMA, ADMA, and SDMA. The inter-assay precision and accuracy were in the range of 1.8-4.9% and -3.0-5.0%, respectively. The intra-assay precision and accuracy were in the order of 1.7-4.6 and -2.6-4.0%, respectively. The recoveries were between 90.0 and 106.6%. The levels of ARG, MMA, ADMA, and SDMA in human plasma were also determined using the developed method.


Assuntos
Arginina/sangue , Arginina/metabolismo , Cromatografia Líquida de Alta Pressão/métodos , Espectrometria de Massas/métodos , Arginina/química , Humanos , Metilação , Estrutura Molecular
13.
J Pharm Biomed Anal ; 33(5): 1155-62, 2003 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-14656607

RESUMO

A simple and rapid isocratic LC/MS coupled with electrospray ionization (ESI) method for simultaneous separation and determination of adenine, hypoxanthine, adenosine and cordycepin in Cordyceps sinensis (Cs) and its substitutes was developed. 2-Chloroadenosine was used as internal standard for this assay. The optimum separation for these analytes was achieved using the mixture of water, methanol and formic acid (85:14:1, v/v/v) as a mobile phase and a 2.0 x 150 mm Shimadzu VP-ODS column. Selective ion monitoring (SIM) mode ([M+H]+ at m/z 136, 137, 268, 252 and 302) was used for quantitative analysis of above four active components. The regression equations were liner in the range of 1.4-140.0 microg ml(-1) for adenine, 0.6-117.5 microg ml(-1) for hypoxanthine, 0.5-128.5 microg ml(-1) for adenosine and 0.5-131.5 microg ml(-1) for cordycepin. The limits of quantitation (LOQ) and detection (LOD) were, respectively 1.4 and 0.5 microg ml(-1) for adenine, 0.6 and 0.2 microg ml(-1) for hypoxanthine, 0.5 and 0.1 microg ml(-1) for adenosine and cordycepin. The recoveries of four constituents were from 93.5 to 107.0%. The nucleoside contents of various types of natural Cs and its substitutes were determined and compared with this developed method.


Assuntos
Cordyceps/química , Cordyceps/isolamento & purificação , Espectrometria de Massas por Ionização por Electrospray/métodos , Adenina/análise , Adenosina/análise , Cordyceps/crescimento & desenvolvimento , Desoxiadenosinas/análise , Medicamentos de Ervas Chinesas/análise , Cromatografia Gasosa-Espectrometria de Massas/métodos , Hipoxantina/análise , Nucleosídeos/análise , Análise de Componente Principal/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA