Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Chromatogr A ; 1730: 465144, 2024 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-38996513

RESUMEN

Ionic liquids, i.e., organic salts with a low melting point, can be used as gas chromatographic liquid stationary phases. These stationary phases have some advantages such as peculiar selectivity, high polarity, and thermostability. Many previous works are devoted to such stationary phases. However, there are still no large enough retention data sets of structurally diverse compounds for them. Consequently, there are very few works devoted to quantitative structure-retention relationships (QSRR) for ionic liquid-based stationary phases. This work is aimed at closing this gap. Three ionic liquids with substituted pyridinium cations are considered. We provide large enough data sets (123-158 compounds) that can be used in further works devoted to QSRR and related methods. We provide a QSRR study using this data set and demonstrate the following. The retention index for a polyethylene glycol stationary phase (denoted as RI_PEG), predicted using another model, can be used as a molecular descriptor. This descriptor significantly improves the accuracy of the QSRR model. Both deep learning-based and linear models were considered for RI_PEG prediction. The ability to predict the retention indices for ionic liquid-based stationary phases with high accuracy is demonstrated. Particular attention is paid to the reproducibility and reliability of the QSRR study. It was demonstrated that adding/removing several compounds, small perturbations of the data set can considerably affect the results such as descriptor importance and model accuracy. These facts have to be considered in order to avoid misleading conclusions. For the QSRR research, we developed a software tool with a graphical user interface, which we called CHERESHNYA. It is intended to select molecular descriptors and construct linear equations connecting molecular descriptors with gas chromatographic retention indices for any stationary phase. The software allows the user to generate several hundred molecular descriptors (one-dimensional and two-dimensional). Among them, predicted retention indices for popular stationary phases such as polydimethylsiloxane and polyethylene glycol are used as molecular descriptors. Various methods for selecting (and assessing the importance of) molecular descriptors have been implemented, in particular the Boruta algorithm, partial least squares, genetic algorithms, L1-regularized regression (LASSO) and others. The software is free, open-source and available online: https://github.com/mtshn/chereshnya.

2.
Anal Chim Acta ; 1297: 342375, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38438243

RESUMEN

BACKGROUND: The NIST retention index database is one the most widely used sources of retention indices. In both untargeted analysis and machine learning studies filtering for potential errors is rather lacking or nonexistent. According to our estimates about 80% of the compounds from both NIST 17 and NIST 20 retention index databases have only one RI value per stationary phase, which makes searching for erroneous values with statistical methods impossible. Manual inspection is also impractical because the database contains more than 300 000 entries. RESULTS: We suggest a two-step procedure to find potentially erroneous retention indices based on machine learning. The first step is to use five predictive models to obtain predicted retention index values for the whole database. The second one is to compare these predicted values against the experimental ones. We consider a retention index erroneous if its accuracy (the difference between predicted and experimental value) is in the bottom 5% for each of the five models simultaneously. Using this method, we were able to detect 2093 outlier entries for standard and semi-standard non-polar stationary phases in the NIST 17 retention index database, 566 of those were corrected or removed by the developers in the NIST 20. SIGNIFICANCE: This is a novel approach to find potentially erroneous entries in a large-scale database with mostly unique entries, which can be applied not only to retention indices. The procedure can help filter and report mishandled data to improve the quality of the dataset for machine learning applications and experimental use.

3.
Molecules ; 28(8)2023 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-37110641

RESUMEN

Unsymmetrical dimethylhydrazine (UDMH) is a widely used rocket propellant. Entering the environment or being stored in uncontrolled conditions, UDMH easily forms an enormous variety (at least many dozens) of transformation products. Environmental pollution by UDMH and its transformation products is a major problem in many countries and across the Arctic region. Unfortunately, previous works often use only electron ionization mass spectrometry with a library search, or they consider only the molecular formula to propose the structures of new products. This is quite an unreliable approach. It was demonstrated that a newly proposed artificial intelligence-based workflow allows for the proposal of structures of UDMH transformation products with a greater degree of certainty. The presented free and open-source software with a convenient graphical user interface facilitates the non-target analysis of industrial samples. It has bundled machine learning models for the prediction of retention indices and mass spectra. A critical analysis of whether a combination of several methods of chromatography and mass spectrometry allows us to elucidate the structure of an unknown UDMH transformation product was provided. It was demonstrated that the use of gas chromatographic retention indices for two stationary phases (polar and non-polar) allows for the rejection of false candidates in many cases when only one retention index is not enough. The structures of five previously unknown UDMH transformation products were proposed, and four previously proposed structures were refined.

4.
Rapid Commun Mass Spectrom ; 37(3): e9437, 2023 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-36409456

RESUMEN

RATIONALE: Databases of electron ionization mass spectra are often used in GC/MS-based untargeted metabolomics analysis. The results of the library search depend on several factors, such as the size and quality of the database, and the library search algorithm. We found out that the list of considered m/z values is another important parameter. Unfortunately, this information is not usually specified by software developers and it is hidden from the end user. METHODS: We created synthetic data sets and figured out how several popular software products (AMDIS, ChromaTOF, MS Search, and Xcalibur) select the list of m/z values for the library search. Moreover, we considered data sets of real mass spectra (presented in both the NIST and FiehnLib libraries) and compared the library search results obtained within different software products. All programs under consideration use the NIST MS Search binaries to perform the library search using the Identity algorithm. RESULTS: We found that AMDIS and ChromaTOF can give biased library search results under particular conditions. In untargeted metabolomics, this can happen when NIST and FiehnLib libraries are used simultaneously, the scan range of the instrument is less than 85, and the correct answer is present only in the FiehnLib library. CONCLUSIONS: The main reason for biased results is that the information about the scan range is not stored in the metadata of library records. As a result, in the case of AMDIS and ChromaTOF software, some unrecorded peaks are considered as missing during the library search, the respective compound is penalized, and the correct answer falls outside the top five or even top 10 hits. At the same time, the default algorithm for selecting the list of considered m/z values implemented in MS Search is free from such unexpected behavior.


Asunto(s)
Algoritmos , Programas Informáticos , Cromatografía de Gases y Espectrometría de Masas/métodos , Espectrometría de Masas/métodos , Metabolómica/métodos
5.
Chemosphere ; 307(Pt 1): 135764, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-35863423

RESUMEN

Unsymmetrical dimethylhydrazine (UDMH) is a toxic and environmentally hostile compound that was massively introduced to the environment during previous decades due to its use in the space and rocket industry. The compound forms multiple transformation products, and many of them are as dangerous as UDMH or even more dangerous. The danger includes, but is not limited to, acute toxicity, chronic health hazards, carcinogenicity, and environmental damage. UDMH transformation products are poorly investigated. In this work, the mixture formed by long storage of the waste that contained UDMH was studied. Even a preliminary screening of such a mixture is a complex task. It consists of dozens of compounds, and most of them are missing in chemical and spectral databases. The complete preparative separation of such a mixture is very laborious. We applied several methods of gas chromatography-mass spectrometry and liquid chromatography-mass spectrometry, and several machine learning and chemoinformatics methods to make a preliminary but informative screening of the mixture. Machine learning allowed predicting retention indices and mass spectra of candidate structures. The combination of various ion sources and a comparison of the observed with the predicted spectra and retention was used to propose confident structures for 24 compounds. It was demonstrated that neither high-resolution mass spectrometry nor mass spectral library matching is enough to elucidate the structures of unknown UDMH transformation products. At the same time, the use of machine learning and a combination of methods significantly improves the identification power. Finally, machine learning was applied to estimate the acute toxicity of the discovered compounds. It was shown that many of them are comparable to or even more toxic than UDMH itself. Such an extremely wide and still underestimated variety of easily formed derivatives of UDMH can lead to a significant underestimation of the potential hazard of this compound.


Asunto(s)
Mezclas Complejas , Aprendizaje Automático , Dimetilhidrazinas , Cromatografía de Gases y Espectrometría de Masas , Espectrometría de Masas/métodos
6.
J Chromatogr A ; 1664: 462792, 2022 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-34999303

RESUMEN

Retention time prediction in high-performance liquid chromatography (HPLC) is the subject of many studies since it can improve the identification of unknown molecules in untargeted profiling using HPLC coupled with high-resolution mass spectrometry. Lots of approaches were developed for retention time prediction in liquid chromatography for a different number of molecules considering various molecular properties and machine learning algorithms. The recently built large retention time data set of standard compounds from the Metabolite and Chemical Entity Database (METLIN) allows researchers to create a model that can be used for retention time prediction of small molecules with wide varieties of structures and physicochemical properties. The ability to predict retention times using the largest data set was studied for different architectures of deep learning models that were trained on molecular fingerprints, and SMILES (string representation of a molecule) represented as one-hot matrices. The best result was achieved with a one-dimensional convolutional neural network (1D CNN) that uses SMILES as an input. The proposed model reached the mean absolute error and the median absolute error equal to 34.7 and 18.7 s, respectively, which outperformed the results previously obtained for this data set. The pre-trained 1D CNN on the METLIN SMRT data set was transferred on five other data sets to evaluate the generalization ability.


Asunto(s)
Cromatografía de Fase Inversa , Aprendizaje Profundo , Cromatografía Liquida , Aprendizaje Automático , Redes Neurales de la Computación
7.
Biomolecules ; 11(12)2021 12 19.
Artículo en Inglés | MEDLINE | ID: mdl-34944547

RESUMEN

Most frequently, the identification of peptides in mass spectrometry-based proteomics is carried out using high-resolution tandem mass spectrometry. In order to increase the accuracy of analysis, additional information on the peptides such as chromatographic retention time and collision cross section in ion mobility spectrometry can be used. An accurate prediction of the collision cross section values allows erroneous candidates to be rejected using a comparison of the observed values and the predictions based on the amino acids sequence. Recently, a massive high-quality data set of peptide collision cross sections was released. This opens up an opportunity to apply the most sophisticated deep learning techniques for this task. Previously, it was shown that a recurrent neural network allows for predicting these values accurately. In this work, we present a deep convolutional neural network that enables us to predict these values more accurately compared with previous studies. We use a neural network with complex architecture that contains both convolutional and fully connected layers and comprehensive methods of converting a peptide to multi-channel 1D spatial data and vector. The source code and pre-trained model are available online.


Asunto(s)
Iones/química , Péptidos/química , Aprendizaje Profundo , Espectrometría de Movilidad Iónica , Redes Neurales de la Computación , Proteómica/métodos , Espectrometría de Masas en Tándem
8.
Int J Mol Sci ; 22(17)2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-34502099

RESUMEN

Prediction of gas chromatographic retention indices based on compound structure is an important task for analytical chemistry. The predicted retention indices can be used as a reference in a mass spectrometry library search despite the fact that their accuracy is worse in comparison with the experimental reference ones. In the last few years, deep learning was applied for this task. The use of deep learning drastically improved the accuracy of retention index prediction for non-polar stationary phases. In this work, we demonstrate for the first time the use of deep learning for retention index prediction on polar (e.g., polyethylene glycol, DB-WAX) and mid-polar (e.g., DB-624, DB-210, DB-1701, OV-17) stationary phases. The achieved accuracy lies in the range of 16-50 in terms of the mean absolute error for several stationary phases and test data sets. We also demonstrate that our approach can be directly applied to the prediction of the second dimension retention times (GC × GC) if a large enough data set is available. The achieved accuracy is considerably better compared with the previous results obtained using linear quantitative structure-retention relationships and ACD ChromGenius software. The source code and pre-trained models are available online.


Asunto(s)
Cromatografía de Gases/métodos , Aprendizaje Profundo , Cromatografía de Gases/normas
9.
Anal Chem ; 92(17): 11818-11825, 2020 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-32867500

RESUMEN

Preliminary compound identification and peak annotation in gas chromatography-mass spectrometry is usually made using mass spectral databases. There are a few algorithms that enable performing a search of a spectrum in a large mass spectral library. In many cases, a library search procedure returns a wrong answer even if a correct compound is contained in a library. In this work, we present a deep learning driven approach to a library search in order to reduce the probability of such cases. Machine learning ranking (learning to rank) is a class of machine learning and deep learning algorithms that perform a comparison (ranking) of objects. This work introduces the usage of deep learning ranking for small molecules identification using low-resolution electron ionization mass spectrometry. Instead of simple similarity measures for two spectra, such as the dot product or the Euclidean distance between vectors that represent spectra, a deep convolutional neural network is used. The deep learning ranking model outperforms other approaches and enables reducing a fraction of wrong answers (at rank-1) by 9-23% depending on the used data set. Spectra from the Golm Metabolome Database, Human Metabolome Database, and FiehnLib were used for testing the model.


Asunto(s)
Aprendizaje Profundo/normas , Cromatografía de Gases y Espectrometría de Masas/métodos , Aprendizaje Automático/normas , Metabolómica/métodos , Humanos
10.
J Chromatogr A ; 1613: 460724, 2020 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-31787264

RESUMEN

Porous graphitic carbon is a versatile stationary phase for high-performance liquid chromatography which performs especially well for isomeric separations. Shape-sensitivity of the stationary phase is caused by a steric effect when a molecule interacts with a flat carbon surface. It follows that branched, non-flat molecules are eluted much earlier than flat or linear molecules. In this short communication we show that if a molecule has a highly ionizable group, the "shape" of a molecule part which is farther from the ionizable group affects retention much more than the "shape" of a molecule part which is closer to the ionizable group. Dipeptides which consist of tert-leucine and norleucine were used as an example. Basic and acidic eluents were used. Retention strongly depends on whether a norleucine or tert-leucine residual is located near the non-ionized side in an eluent for both basic and acidic eluents. A residual located on the opposite side is less important. To investigate the possible causes of this peculiar retention behavior we compared the retention behavior of these dipeptides for porous graphitic carbon with the behavior for other types of stationary phases and with the calculated physicochemical properties. Strong and complex dependence of elution order on a mobile phase composition is demonstrated. The separation of other dileucine isomers is also considered. The applicability of porous graphitic carbon for the separation of complex mixtures of isomeric peptides is discussed.


Asunto(s)
Cromatografía Líquida de Alta Presión , Grafito/química , Leucina/química , Dipéptidos/química , Dipéptidos/aislamiento & purificación , Isomerismo , Leucina/aislamiento & purificación , Porosidad
11.
J Chromatogr A ; 1607: 460395, 2019 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-31405570

RESUMEN

A deep convolutional neural network was used for the estimation of gas chromatographic retention indices on non-polar (polydimethylsiloxane and polydimethyl(5%-phenyl) siloxane) stationary phases. The neural network can be used for candidate ranking while searching a mass spectral database. A linear representation (SMILES notation) of the molecule structure was used as an input for the model. The input line was converted to a one-hot matrix and then directly processed by the neural network. The calculation of any common molecular descriptors is avoided, following the modern tendency in machine learning: to allow the neural network to find the most preferable features by itself instead of using hard-coded features. The model has two 1D-convolutional layers with 120 neurons each followed by a pooling layer and a fully-connected layer with 200 hidden neurons. The model was compared with state-of-the-art models for prediction of gas chromatographic indices based on molecular descriptors and on functional groups contributions. On different data sets better accuracy is shown together with greater versatility. The applicability to diverse sets of flavors and fragrances, essential oils, metabolites is shown. The possibility of using the model for improvement of mass spectral identification (without reference retention index) is demonstrated. The median absolute error and the median percentage error are in the range of 17.3 (0.93%) to 38.1 (2.15%) depending on used test data set. Ready-to-use neural network parameters are provided.


Asunto(s)
Cromatografía de Gases/métodos , Redes Neurales de la Computación , Bases de Datos Factuales , Cromatografía de Gases y Espectrometría de Masas , Análisis de Regresión
12.
Chemosphere ; 217: 95-99, 2019 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-30414547

RESUMEN

Unsymmetrical dimethylhydrazine (UDMH) is a rocket propellant for carrier rockets and missiles. UDMH is environmentally hostile compound, which easily forms a variety of toxic products of oxidative transformation. The liquidation of unused UDMH from retired launch sites is performed by the complete burning of UDMH-containing wastes. Due cyclicity of the burning equipment the UDMH-containing wastes are subject of prolonged storage in contact with atmospheric oxygen and thus contains a complicated mixture of UDMH degradation products. High performance liquid chromatography (HPLC), high resolution mass spectrometry (HRMS) and NMR were used for the isolation on characterization of new highly polar and potentially toxic UDMH transformation products in the mixture. Two series of unreported isomers with high ionization cross section in electrospray ionization were isolated by repeated preparative HPLC. The structures of the isomers were established by tandem HRMS and NMR. The cytotoxicity of the isolated compounds has been preliminarily studied and found to be similar to UDMH or higher.


Asunto(s)
Dimetilhidrazinas/química , Triazoles/química , Cromatografía Líquida de Alta Presión , Dimetilhidrazinas/toxicidad , Isomerismo , Espectrometría de Masas , Oxidación-Reducción , Oxígeno/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA