Gaussian Mixture Modeling Extensions for Improved False Discovery Rate Estimation in GC-MS Metabolomics.

Flores, Javier E; Bramer, Lisa M; Degnan, David J; Paurus, Vanessa L; Corilo, Yuri E; Clendinen, Chaevien S

Flores, Javier E; Bramer, Lisa M; Degnan, David J; Paurus, Vanessa L; Corilo, Yuri E; Clendinen, Chaevien S.

Afiliação

Flores JE; Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
Bramer LM; Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
Degnan DJ; Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
Paurus VL; Environmental Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
Corilo YE; Environmental Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
Clendinen CS; Environmental Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.

J Am Soc Mass Spectrom ; 34(6): 1096-1104, 2023 Jun 07.

Article em En | MEDLINE | ID: mdl-37084380

ABSTRACT

ABSTRACT

The ability to reliably identify small molecules (e.g., metabolites) is key toward driving scientific advancement in metabolomics. Gas chromatography-mass spectrometry (GC-MS) is an analytic method that may be applied to facilitate this process. The typical GC-MS identification workflow involves quantifying the similarity of an observed sample spectrum and other features (e.g., retention index) to that of several references, noting the compound of the best-matching reference spectrum as the identified metabolite. While a deluge of similarity metrics exist, none quantify the error rate of generated identifications, thereby presenting an unknown risk of false identification or discovery. To quantify this unknown risk, we propose a model-based framework for estimating the false discovery rate (FDR) among a set of identifications. Extending a traditional mixture modeling framework, our method incorporates both similarity score and experimental information in estimating the FDR. We apply these models to identification lists derived from across 548 samples of varying complexity and sample type (e.g., fungal species, standard mixtures, etc.), comparing their performance to that of the traditional Gaussian mixture model (GMM). Through simulation, we additionally assess the impact of reference library size on the accuracy of FDR estimates. In comparing the best performing model extensions to the GMM, our results indicate relative decreases in median absolute estimation error (MAE) ranging from 12% to 70%, based on comparisons of the median MAEs across all hit-lists. Results indicate that these relative performance improvements generally hold despite library size; however FDR estimation error typically worsens as the set of reference compounds diminishes.

Assuntos

Metabolômica; Cromatografia Gasosa-Espectrometria de Massas/métodos; Metabolômica/métodos

Palavras-chave

false positive rate; metabolite identification; spectral similarity score

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Metabolômica Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google