Búsqueda | BVS Nicaragua

Evaluation of Multivariate Classification Models for Analyzing NMR Metabolomics Data.

Vu, Thao; Siemek, Parker; Bhinderwala, Fatema; Xu, Yuhang; Powers, Robert.

J Proteome Res ; 18(9): 3282-3294, 2019 09 06.

Artículo en Inglés | MEDLINE | ID: mdl-31382745

RESUMEN

Analytical techniques such as NMR and mass spectrometry can generate large metabolomics data sets containing thousands of spectral features derived from numerous biological observations. Multivariate data analysis is routinely used to uncover the underlying biological information contained within these large metabolomics data sets. This is typically accomplished by classifying the observations into groups (e.g., control versus treated) and by identifying associated discriminating features. There are a variety of classification models to select from, which include some well-established techniques (e.g., principal component analysis [PCA], orthogonal projection to latent structure [OPLS], or partial least-squares projection to latent structures [PLS]) and newly emerging machine learning algorithms (e.g., support vector machines or random forests). However, it is unclear which classification model, if any, is an optimal choice for the analysis of metabolomics data. Herein, we present a comprehensive evaluation of five common classification models routinely employed in the metabolomics field and that are also currently available in our MVAPACK metabolomics software package. Simulated and experimental NMR data sets with various levels of group separation were used to evaluate each model. Model performance was assessed by classification accuracy rate, by the area under a receiver operating characteristic (AUROC) curve, and by the identification of true discriminating features. Our findings suggest that the five classification models perform equally well with robust data sets. Only when the models are stressed with subtle data set differences does OPLS emerge as the best-performing model. OPLS maintained a high-prediction accuracy rate and a large area under the ROC curve while yielding loadings closest to the true loadings with limited group separations.

Asunto(s)

Espectroscopía de Resonancia Magnética/métodos , Espectrometría de Masas/métodos , Metabolómica/métodos , Resonancia Magnética Nuclear Biomolecular/métodos , Algoritmos , Análisis Discriminante , Análisis de los Mínimos Cuadrados , Espectroscopía de Resonancia Magnética/estadística & datos numéricos , Espectrometría de Masas/estadística & datos numéricos , Metabolómica/estadística & datos numéricos , Análisis Multivariante , Análisis de Componente Principal , Máquina de Vectores de Soporte

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA