RESUMEN
Mass spectrometry is a ubiquitous technique capable of complex chemical analysis. The fragmentation patterns that appear in mass spectrometry are an excellent target for artificial intelligence methods to automate and expedite the analysis of data to identify targets such as functional groups. To develop this approach, we trained models on electron ionization (a reproducible hard fragmentation technique) mass spectra so that not only the final model accuracies but also the reasoning behind model assignments could be evaluated. The convolutional neural network (CNN) models were trained on 2D images of the spectra using transfer learning of Inception V3, and the logistic regression models were trained using array-based data and Scikit Learn implementation in Python. Our training dataset consisted of 21,166 mass spectra from the United States' National Institute of Standards and Technology (NIST) Webbook. The data was used to train models to identify functional groups, both specific (e.g., amines, esters) and generalized classifications (aromatics, oxygen-containing functional groups, and nitrogen-containing functional groups). We found that the highest final accuracies on identifying new data were observed using logistic regression rather than transfer learning on CNN models. It was also determined that the mass range most beneficial for functional group analysis is 0-100 m/z. We also found success in correctly identifying functional groups of example molecules selected from both the NIST database and experimental data. Beyond functional group analysis, we also have developed a methodology to identify impactful fragments for the accurate detection of the models' targets. The results demonstrate a potential pathway for analyzing and screening substantial amounts of mass spectral data.
RESUMEN
The chemistry and structure of the air-ocean interface modulate biogeochemical processes between the ocean and atmosphere and therefore impact sea spray aerosol properties, cloud and ice nucleation, and climate. Protein macromolecules are enriched in the sea surface microlayer and have complex adsorption properties due to the unique molecular balance of hydrophobicity and hydrophilicity. Additionally, interfacial adsorption properties of proteins are of interest as important inputs for ocean climate modeling. Bovine serum albumin is used here as a model protein to investigate the dynamic surface behavior of proteins under several variable conditions including solution ionic strength, temperature, and the presence of a stearic acid (C17COOH) monolayer at the air-water interface. Key vibrational modes of bovine serum albumin are examined via infrared reflectance-absorbance spectroscopy, a specular reflection method that ratios out the solution phase and highlights the aqueous surface to determine, at a molecular level, the surface structural changes and factors affecting adsorption to the solution surface. Amide band reflection absorption intensities reveal the extent of protein adsorption under each set of conditions. Studies reveal the nuanced behavior of protein adsorption impacted by ocean-relevant sodium concentrations. Moreover, protein adsorption is most strongly affected by the synergistic effects of divalent cations and increased temperature.
Asunto(s)
Albúmina Sérica Bovina , Agua , Albúmina Sérica Bovina/química , Agua/química , Adsorción , Temperatura , Cationes , Propiedades de SuperficieRESUMEN
Fourier transform infrared spectroscopy (FTIR) is a ubiquitous spectroscopic technique. Spectral interpretation is a time-consuming process, but it yields important information about functional groups present in compounds and in complex substances. We develop a generalizable model via a machine learning (ML) algorithm using convolutional neural networks (CNNs) to identify the presence of functional groups in gas-phase FTIR spectra. The ML models reduce the amount of time required to analyze functional groups and facilitate interpretation of FTIR spectra. Through web scraping, we acquire intensity-frequency data from 8728 gas-phase organic molecules within the NIST spectral database and transform the data into spectral images. We successfully train models for 15 of the most common organic functional groups, which we then determine via identification from previously untrained spectra. These models serve to expand the application of FTIR measurements for facile analysis of organic samples. Our approach was done such that we have broad functional group models that infer in tandem to provide full interpretation of a spectrum. We present the first implementation of ML using image-based CNNs for predicting functional groups from a spectroscopic method.