Development of variance rank initiated-unsupervised sample indexing for gas chromatography-mass spectrometry analysis.

Cain, Caitlin N; Sudol, Paige E; Berrier, Kelsey L; Synovec, Robert E

Cain, Caitlin N; Sudol, Paige E; Berrier, Kelsey L; Synovec, Robert E.

Afiliación

Cain CN; Department of Chemistry, Box 351700, University of Washington, Seattle, WA, 98195, USA.
Sudol PE; Department of Chemistry, Box 351700, University of Washington, Seattle, WA, 98195, USA.
Berrier KL; Department of Chemistry, Box 351700, University of Washington, Seattle, WA, 98195, USA.
Synovec RE; Department of Chemistry, Box 351700, University of Washington, Seattle, WA, 98195, USA. Electronic address: synovec@chem.washington.edu.

Talanta ; 233: 122495, 2021 Oct 01.

Article en En | MEDLINE | ID: mdl-34215113

ABSTRACT

ABSTRACT

Traditional non-targeted chemometric workflows for gas chromatography-mass spectrometry (GC-MS) data rely on using supervised methods, which requires a priori knowledge of sample class membership. Herein, we propose a simple, unsupervised chemometric workflow known as variance rank initiated-unsupervised sample indexing (VRI-USI). VRI-USI discovers analyte peaks exhibiting high relative variance across all samples, followed by k-means clustering on the individual peaks. Based upon how the samples cluster for a given peak, a sample index assignment is provided. Using a probabilistic argument, if the same sample index assignment appears for several discovered peaks, then this outcome strongly suggests that the samples are properly classified by that particular sample index assignment. Thus, relevant chemical differences between the samples have been discovered in an unsupervised fashion. The VRI-USI workflow is demonstrated on three, increasingly difficult datasets simulations, yeast metabolomics, and human cancer metabolomics. For simulated GC-MS datasets, VRI-USI discovered 85-90% of analytes modeled to vary between sample classes. Nineteen out of 53 peaks in the peak table developed for the yeast metabolome dataset had the same sample index assignments, indicating that those indices are most likely due to class-distinguishing chemical differences. A t-test revealed that 22 out of 53 peaks were statistically significant (p < 0.05) when using those sample index assignments. Likewise, for the human cancer metabolomics study, VRI-USI discovered 25 analytes that were statistically different (p < 0.05) using the sample index assignments determined to highlight meaningful sample-based differences. For all datasets, the sample index assignments that were deduced from VRI-USI were the correct class-based difference when using prior knowledge. VRI-USI holds promise as an exploratory data analysis workflow for studies in which analysts do not readily have a priori class information or want to uncover the underlying nature of their dataset.

Asunto(s)

Metaboloma; Metabolómica; Análisis por Conglomerados; Cromatografía de Gases y Espectrometría de Masas; Humanos; Flujo de Trabajo

Palabras clave

Chemometrics; Exploratory data analysis; Gas chromatography-mass spectrometry; Unsupervised; Variance rank initiated-unsupervised sample indexing

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Metaboloma / Metabolómica Límite: Humans Idioma: En Revista: Talanta Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google