CCS Predictor 2.0: An Open-Source Jupyter Notebook Tool for Filtering Out False Positives in Metabolomics.

Rainey, Markace A; Watson, Chandler A; Asef, Carter K; Foster, Makayla R; Baker, Erin S; Fernández, Facundo M

Rainey, Markace A; Watson, Chandler A; Asef, Carter K; Foster, Makayla R; Baker, Erin S; Fernández, Facundo M.

Afiliação

Rainey MA; School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlantic Dr. NWAtlanta, Georgia30332, United States.
Watson CA; School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlantic Dr. NWAtlanta, Georgia30332, United States.
Asef CK; School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlantic Dr. NWAtlanta, Georgia30332, United States.
Foster MR; Department of Chemistry, North Carolina State University, Raleigh, North Carolina27695, United States.
Baker ES; Department of Chemistry, North Carolina State University, Raleigh, North Carolina27695, United States.
Fernández FM; Comparative Medicine Institute, North Carolina State University, Raleigh, North Carolina27695, United States.

Anal Chem ; 94(50): 17456-17466, 2022 12 20.

Article em En | MEDLINE | ID: mdl-36473057

ABSTRACT

ABSTRACT

Metabolite annotation continues to be the widely accepted bottleneck in nontargeted metabolomics workflows. Annotation of metabolites typically relies on a combination of high-resolution mass spectrometry (MS) with parent and tandem measurements, isotope cluster evaluations, and Kendrick mass defect (KMD) analysis. Chromatographic retention time matching with standards is often used at the later stages of the process, which can also be followed by metabolite isolation and structure confirmation utilizing nuclear magnetic resonance (NMR) spectroscopy. The measurement of gas-phase collision cross-section (CCS) values by ion mobility (IM) spectrometry also adds an important dimension to this workflow by generating an additional molecular parameter that can be used for filtering unlikely structures. The millisecond timescale of IM spectrometry allows the rapid measurement of CCS values and allows easy pairing with existing MS workflows. Here, we report on a highly accurate machine learning algorithm (CCSP 2.0) in an open-source Jupyter Notebook format to predict CCS values based on linear support vector regression models. This tool allows customization of the training set to the needs of the user, enabling the production of models for new adducts or previously unexplored molecular classes. CCSP produces predictions with accuracy equal to or greater than existing machine learning approaches such as CCSbase, DeepCCS, and AllCCS, while being better aligned with FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Another unique aspect of CCSP 2.0 is its inclusion of a large library of 1613 molecular descriptors via the Mordred Python package, further encoding the fine aspects of isomeric molecular structures. CCS prediction accuracy was tested using CCS values in the McLean CCS Compendium with median relative errors of 1.25, 1.73, and 1.87% for the 170 [M - H]-, 155 [M + H]+, and 138 [M + Na]+ adducts tested. For superclass-matched data sets, CCS predictions via CCSP allowed filtering of 36.1% of incorrect structures while retaining a total of 100% of the correct annotations using a ΔCCS threshold of 2.8% and a mass error of 10 ppm.

Assuntos

Algoritmos; Metabolômica; Metabolômica/métodos; Espectrometria de Massas/métodos; Cromatografia Líquida de Alta Pressão; Aprendizado de Máquina

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Metabolômica Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Anal Chem Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google