Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Anal Chem ; 95(32): 11901-11907, 2023 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-37540774

RESUMEN

The inability to identify the structures of most metabolites detected in environmental or biological samples limits the utility of nontargeted metabolomics. The most widely used analytical approaches combine mass spectrometry and machine learning methods to rank candidate structures contained in large chemical databases. Given the large chemical space typically searched, the use of additional orthogonal data may improve the identification rates and reliability. Here, we present results of combining experimental and computational mass and IR spectral data for high-throughput nontargeted chemical structure identification. Experimental MS/MS and gas-phase IR data for 148 test compounds were obtained from NIST. Candidate structures for each of the test compounds were obtained from PubChem (mean = 4444 candidate structures per test compound). Our workflow used CSI:FingerID to initially score and rank the candidate structures. The top 1000 ranked candidates were subsequently used for IR spectra prediction, scoring, and ranking using density functional theory (DFT-IR). Final ranking of the candidates was based on a composite score calculated as the average of the CSI:FingerID and DFT-IR rankings. This approach resulted in the correct identification of 88 of the 148 test compounds (59%). 129 of the 148 test compounds (87%) were ranked within the top 20 candidates. These identification rates are the highest yet reported when candidate structures are used from PubChem. Combining experimental and computational MS/MS and IR spectral data is a potentially powerful option for prioritizing candidates for final structure verification.


Asunto(s)
Bases de Datos de Compuestos Químicos , Espectrometría de Masas en Tándem , Reproducibilidad de los Resultados , Metabolómica/métodos , Aprendizaje Automático
2.
Anal Chem ; 93(30): 10688-10696, 2021 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-34288660

RESUMEN

The high-throughput identification of unknown metabolites in biological samples remains challenging. Most current non-targeted metabolomics studies rely on mass spectrometry, followed by computational methods that rank thousands of candidate structures based on how closely their predicted mass spectra match the experimental mass spectrum of an unknown. We reasoned that the infrared (IR) spectra could be used in an analogous manner and could add orthologous structure discrimination; however, this has never been evaluated on large data sets. Here, we present results of a high-throughput computational method for predicting IR spectra of candidate compounds obtained from the PubChem database. Predicted spectra were ranked based on their similarity to gas-phase experimental IR spectra of test compounds obtained from the NIST. Our computational workflow (IRdentify) consists of a fast semiempirical quantum mechanical method for initial IR spectra prediction, ranking, and triaging, followed by a final IR spectra prediction and ranking using density functional theory. This approach resulted in the correct identification of 47% of 258 test compounds. On average, there were 2152 candidate structures evaluated for each test compound, giving a total of approximately 555,200 candidate structures evaluated. We discuss several variables that influenced the identification accuracy and then demonstrate the potential application of this approach in three areas: (1) combining IR and mass spectra rankings into a single composite rank score, (2) identifying the precursor and fragment ions using cryogenic ion vibrational spectroscopy, and (3) the incorporation of a trimethylsilyl derivatization step to extend the method compatibility to less-volatile compounds. Overall, our results suggest that matching computational with experimental IR spectra is a potentially powerful orthogonal option for adding significant high-throughput chemical structure discrimination when used with other non-targeted chemical structure identification methods.


Asunto(s)
Metabolómica , Bases de Datos Factuales , Iones , Espectrometría de Masas
3.
Methods Mol Biol ; 2084: 283-295, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31729668

RESUMEN

Structure elucidation of metabolites (<1000 Da) in biofluids is extremely challenging due to the diversity and complexity of chemical structure space. Generally, due to lack of reference tandem mass data (MS2), in silico fragmenters are used to rank candidates acquired from chemical databases as a function on how well they explain an experimental collision-induced dissociation spectrum. However, multistage fragmentation data (i.e., MS3) have not been adequately utilized in current metabolomics structure elucidation pipelines. To address this shortcoming, here we describe an experimental (nontargeted direct infusion ion mobility-mass spectrometry-based) and computational workflow to acquire and utilize multistage mass (MS3) spectrometry data for database-assisted structure elucidation.


Asunto(s)
Biología Computacional , Metabolómica/métodos , Programas Informáticos , Espectrometría de Masas en Tándem , Biología Computacional/métodos , Bases de Datos de Compuestos Químicos , Estructura Molecular , Espectrometría de Masas en Tándem/métodos
4.
Anal Chem ; 90(21): 12752-12760, 2018 11 06.
Artículo en Inglés | MEDLINE | ID: mdl-30350614

RESUMEN

Liquid chromatography coupled with electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) is a major analytical technique used for nontargeted identification of metabolites in biological fluids. Typically, in LC-ESI-MS/MS based database assisted structure elucidation pipelines, the exact mass of an unknown compound is used to mine a chemical structure database to acquire an initial set of possible candidates. Subsequent matching of the collision induced dissociation (CID) spectrum of the unknown to the CID spectra of candidate structures facilitates identification. However, this approach often fails because of the large numbers of potential candidates (i.e., false positives) for which CID spectra are not available. To overcome this problem, CID fragmentation predication programs have been developed, but these also have limited success if large numbers of isomers with similar CID spectra are present in the candidate set. In this study, we investigated the use of a retention index (RI) predictive model as an orthogonal method to help improve identification rates. The model was used to eliminate candidate structures whose predicted RI values differed significantly from the experimentally determined RI value of the unknown compound. We tested this approach using a set of ninety-one endogenous metabolites and four in silico CID fragmentation algorithms: CFM-ID, CSI:FingerID, Mass Frontier, and MetFrag. Candidate sets obtained from PubChem and the Human Metabolite Database (HMDB) were ranked with and without RI filtering followed by in silico spectral matching. Upon RI filtering, 12 of the ninety-one metabolites were eliminated from their respective candidate sets, i.e., were scored incorrectly as negatives. For the remaining seventy-nine compounds, we show that RI filtering eliminated an average of 58% from PubChem candidate sets. This resulted in an approximately 2-fold improvement in average rankings when using CFM-ID, Mass Frontier, and MetFrag. In addition, RI filtering slightly increased the occurrence of number one rankings for all 4 fragmentation algorithms. However, RI filtering did not significantly improve average rankings when HMDB was used as the candidate database, nor did it significantly improve average rankings when using CSI:FingerID. Overall, we show that the current RI model incorrectly eliminated more true positives (12) than were expected (4-5) on the basis of the filtering method. However, it slightly improved the number of correct first place rankings and improved overall average rankings when using CFM-ID, Mass Frontier, and MetFrag.


Asunto(s)
Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Metabolómica/métodos , Modelos Químicos , Redes Neurales de la Computación , Algoritmos , Cromatografía Liquida , Simulación por Computador , Estructura Molecular , Espectrometría de Masa por Ionización de Electrospray
5.
J Chem Inf Model ; 58(3): 591-604, 2018 03 26.
Artículo en Inglés | MEDLINE | ID: mdl-29489351

RESUMEN

The MolFind application has been developed as a nontargeted metabolomics chemometric tool to facilitate structure identification when HPLC biofluids analysis reveals a feature of interest. Here synthetic compounds are selected and measured to form the basis of a new, more accurate, HPLC retention index model for use with MolFind. We show that relatively inexpensive synthetic screening compounds with simple structures can be used to develop an artificial neural network model that is successful in making quality predictions for human metabolites. A total of 1955 compounds were obtained and measured for the model. A separate set of 202 human metabolites was used for independent validation. The new ANN model showed improved accuracy over previous models. The model, based on relatively simple compounds, was able to make quality predictions for complex compounds not similar to training data. Independent validation metabolites with feature combinations found in three or more training compounds were predicted with 97% sensitivity while metabolites with feature combinations found in less than three training compounds were predicted with >90% sensitivity. The study describes the method used to select synthetic compounds and new descriptors developed to encode the relationship between lipophilic molecular subgraphs and HPLC retention. Finally, we introduce the QRI (qualitative range of interest) modification of neural network backpropagation learning to generate models simultaneously based on quantitative and qualitative data.


Asunto(s)
Cromatografía Líquida de Alta Presión/métodos , Cromatografía de Fase Inversa/métodos , Metabolómica/métodos , Humanos , Metaboloma , Redes Neurales de la Computación
6.
Metabolites ; 6(2)2016 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-27258318

RESUMEN

Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these cases, constrained structure generation with combinatorial structure generators provides a potential option. Here we evaluated structure generation constraints based on the specification of: (1) substructures required (i.e., seed structures); (2) substructures not allowed; and (3) filters to remove incorrect structures. Our approach (database assisted structure identification, DASI) used predictive models in MolFind to find candidate structures with chemical and physical properties similar to the unknown. These candidates were then used for seed structure generation using eight different structure generation algorithms. One algorithm was able to generate correct seed structures for 21/39 test compounds. Eleven of these seed structures were large enough to constrain the combinatorial structure generator to fewer than 100,000 structures. In 35/39 cases, at least one algorithm was able to generate a correct seed structure. The DASI method has several limitations and will require further experimental validation and optimization. At present, it seems most useful for identifying the structure of unknown-unknowns with molecular weights <200 Da.

7.
Bioanalysis ; 7(8): 939-55, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25966007

RESUMEN

BACKGROUND: Artificial Neural Networks (ANN) are extensively used to model 'omics' data. Different modeling methodologies and combinations of adjustable parameters influence model performance and complicate model optimization. METHODOLOGY: We evaluated optimization of four ANN modeling parameters (learning rate annealing, stopping criteria, data split method, network architecture) using retention index (RI) data for 390 compounds. Models were assessed by independent validation (I-Val) using newly measured RI values for 1492 compounds. CONCLUSION: The best model demonstrated an I-Val standard error of 55 RI units and was built using a Ward's clustering data split and a minimally nonlinear network architecture. Use of validation statistics for stopping and final model selection resulted in better independent validation performance than the use of test set statistics.


Asunto(s)
Inteligencia Artificial/normas , Cromatografía Líquida de Alta Presión/métodos , Metabolómica , Redes Neurales de la Computación , Biología de Sistemas/normas , Análisis por Conglomerados , Bases de Datos Factuales , Espectrometría de Masas en Tándem/métodos
8.
Metabolomics ; 11(3): 753-763, 2015 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-25960696

RESUMEN

Quantitative biases in the abundance of precursor and product ions due to mass discrimination in RF-only ion guides results in inaccurate collision induced dissociation (CID) spectra. We evaluated the effects of collision cell RF voltage and collision energy on CID spectra using ten singly protonated compounds (46-854 Da) in an orthogonal acceleration time-of-flight mass spectrometer. The relative ion transfer efficiency, i.e. the relative amount of ions transferred through the ion guide at any particular RF voltage was shown to be dependent on the ion's m/z. We developed an algorithm to correct for the mass discriminating effects of RF voltage on CID spectra. The algorithm was tested for both precursor and product ions at multiple RF voltages and collision energies in order to ensure reliability. Our results suggest that compounds that generate major product ions with m/z values <150 have peak intensities that deviate substantially from their actual abundance. This has implications for small molecule metabolomics research, particularly for studies that rely on CID spectra matching methods for structure identification.

10.
Curr Comput Aided Drug Des ; 10(4): 374-82, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25549758

RESUMEN

A novel approach is developed for modeling situations in which the modeled property is an algebraically transformed version of the original experimental data. In many cases such a transformation results in a data set with a significantly smaller data range. Here we explore the effects of range-of-data on modeling statistics. We illustrate a twostep method using data on the mass spectrometry collision energy (CE) that is required to decompose 50% of precursor ions to fragments (CE50). Earlier we showed that a nonlinear center-of-mass transformation, yielding Ecom50, produces values less dependent on the specific mass spectrometric experimental conditions. For this data set the Ecom50 range is 13.5% of the CE50 range. We propose a two-step modeling method. First, the original experimental data, CE50, (larger range-of-data) is modeled by a standard modeling method (PLS). Second, the calculated dependent variable resulting from the modeling is algebraically transformed (not modeled) according to the center-of-mass transformation, providing the generally more useful data, Ecom50. As shown here, use of this two-step method for predicting Ecom50 (from previously published data) produces a standard error 21% smaller and correspondingly reduces the confidence interval for prediction. Some specific implications for prediction are given for a published data set. This work is part of the ongoing development of a system of models to assist in the development of human metabolites.


Asunto(s)
Modelos Estadísticos , Humanos , Espectrometría de Masas
11.
J Chem Inf Model ; 53(9): 2483-92, 2013 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-23991755

RESUMEN

Current methods of structure identification in mass-spectrometry-based nontargeted metabolomics rely on matching experimentally determined features of an unknown compound to those of candidate compounds contained in biochemical databases. A major limitation of this approach is the relatively small number of compounds currently included in these databases. If the correct structure is not present in a database, it cannot be identified, and if it cannot be identified, it cannot be included in a database. Thus, there is an urgent need to augment metabolomics databases with rationally designed biochemical structures using alternative means. Here we present the In Vivo/In Silico Metabolites Database (IIMDB), a database of in silico enzymatically synthesized metabolites, to partially address this problem. The database, which is available at http://metabolomics.pharm.uconn.edu/iimdb/, includes ~23,000 known compounds (mammalian metabolites, drugs, secondary plant metabolites, and glycerophospholipids) collected from existing biochemical databases plus more than 400,000 computationally generated human phase-I and phase-II metabolites of these known compounds. IIMDB features a user-friendly web interface and a programmer-friendly RESTful web service. Ninety-five percent of the computationally generated metabolites in IIMDB were not found in any existing database. However, 21,640 were identical to compounds already listed in PubChem, HMDB, KEGG, or HumanCyc. Furthermore, the vast majority of these in silico metabolites were scored as biological using BioSM, a software program that identifies biochemical structures in chemical structure space. These results suggest that in silico biochemical synthesis represents a viable approach for significantly augmenting biochemical databases for nontargeted metabolomics applications.


Asunto(s)
Bases de Datos Factuales , Enzimas/metabolismo , Metabolómica/métodos , Animales , Glicerofosfolípidos/metabolismo , Humanos , Internet , Preparaciones Farmacéuticas/metabolismo , Plantas/metabolismo , Interfaz Usuario-Computador
12.
J Chem Inf Model ; 53(3): 601-12, 2013 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-23330685

RESUMEN

The structural identification of unknown biochemical compounds in complex biofluids continues to be a major challenge in metabolomics research. Using LC/MS, there are currently two major options for solving this problem: searching small biochemical databases, which often do not contain the unknown of interest or searching large chemical databases which include large numbers of nonbiochemical compounds. Searching larger chemical databases (larger chemical space) increases the odds of identifying an unknown biochemical compound, but only if nonbiochemical structures can be eliminated from consideration. In this paper we present BioSM; a cheminformatics tool that uses known endogenous mammalian biochemical compounds (as scaffolds) and graph matching methods to identify endogenous mammalian biochemical structures in chemical structure space. The results of a comprehensive set of empirical experiments suggest that BioSM identifies endogenous mammalian biochemical structures with high accuracy. In a leave-one-out cross validation experiment, BioSM correctly predicted 95% of 1388 Kyoto Encyclopedia of Genes and Genomes (KEGG) compounds as endogenous mammalian biochemicals using 1565 scaffolds. Analysis of two additional biological data sets containing 2330 human metabolites (HMDB) and 2416 plant secondary metabolites (KEGG) resulted in biochemical annotations of 89% and 72% of the compounds, respectively. When a data set of 3895 drugs (DrugBank and USAN) was tested, 48% of these structures were predicted to be biochemical. However, when a set of synthetic chemical compounds (Chembridge and Chemsynthesis databases) were examined, only 29% of the 458,207 structures were predicted to be biochemical. Moreover, BioSM predicted that 34% of 883,199 randomly selected compounds from PubChem were biochemical. We then expanded the scaffold list to 3927 biochemical compounds and reevaluated the above data sets to determine whether scaffold number influenced model performance. Although there were significant improvements in model sensitivity and specificity using the larger scaffold list, the data set comparison results were very similar. These results suggest that additional biochemical scaffolds will not further improve our representation of biochemical structure space and that the model is reasonably robust. BioSM provides a qualitative (yes/no) and quantitative (ranking) method for endogenous mammalian biochemical annotation of chemical space and, thus, will be useful in the identification of unknown biochemical structures in metabolomics. BioSM is freely available at http://metabolomics.pharm.uconn.edu.


Asunto(s)
Mamíferos/metabolismo , Metabolómica/métodos , Algoritmos , Animales , Inteligencia Artificial , Líquidos Corporales/química , Citocromos , Bases de Datos de Proteínas , Humanos , Modelos Químicos , Modelos Moleculares , Reproducibilidad de los Resultados , Bibliotecas de Moléculas Pequeñas
13.
Comput Struct Biotechnol J ; 5: e201302005, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24688698

RESUMEN

The identification of compounds in complex mixtures remains challenging despite recent advances in analytical techniques. At present, no single method can detect and quantify the vast array of compounds that might be of potential interest in metabolomics studies. High performance liquid chromatography/mass spectrometry (HPLC/MS) is often considered the analytical method of choice for analysis of biofluids. The positive identification of an unknown involves matching at least two orthogonal HPLC/MS measurements (exact mass, retention index, drift time etc.) against an authentic standard. However, due to the limited availability of authentic standards, an alternative approach involves matching known and measured features of the unknown compound with computationally predicted features for a set of candidate compounds downloaded from a chemical database. Computationally predicted features include retention index, ECOM50 (energy required to decompose 50% of a selected precursor ion in a collision induced dissociation cell), drift time, whether the unknown compound is biological or synthetic and a collision induced dissociation (CID) spectrum. Computational predictions are used to filter the initial "bin" of candidate compounds. The final output is a ranked list of candidates that best match the known and measured features. In this mini review, we discuss cheminformatics methods underlying this database search-filter identification approach.

14.
Anal Chem ; 84(21): 9388-94, 2012 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-23039714

RESUMEN

In this paper, we present MolFind, a highly multithreaded pipeline type software package for use as an aid in identifying chemical structures in complex biofluids and mixtures. MolFind is specifically designed for high-performance liquid chromatography/mass spectrometry (HPLC/MS) data inputs typical of metabolomics studies where structure identification is the ultimate goal. MolFind enables compound identification by matching HPLC/MS-based experimental data obtained for an unknown compound with computationally derived HPLC/MS values for candidate compounds downloaded from chemical databases such as PubChem. The downloaded "bins" consist of all compounds matching the monoisotopic molecular weight of the unknown. The computational HPLC/MS values predicted include retention index (RI), ECOM(50) (energy required to fragment 50% of a selected precursor ion), drift time, and collision induced dissociation (CID) spectrum. RI, ECOM(50), and drift-time models are used for filtering compounds downloaded from PubChem. The remaining candidates are then ranked based on CID spectra matching. Current RI and ECOM(50) models allow for the removal of about 28% of compounds from PubChem bins. Our estimates suggest that this could be improved to as much as 87% with additional chemical structures included in the computational models. Quantitative structure property relationship-based modeling of drift times showed a better correlation with experimentally determined drift times than did Mobcal cross-sectional areas. In 23 of 35 example cases, filtering PubChem bins with RI and ECOM(50) predictive models resulted in improved ranking of the unknown compounds compared to previous studies using CID spectra matching alone. In 19 of 35 examples, the correct candidate was ranked within the top 20 compounds in bins containing an average of 1635 compounds.


Asunto(s)
Cromatografía Líquida de Alta Presión/métodos , Espectrometría de Masas/métodos , Programas Informáticos
15.
Rapid Commun Mass Spectrom ; 26(19): 2303-10, 2012 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-22956322

RESUMEN

RATIONALE: The determination of the center-of-mass energy at which 50% of a precursor ion decomposes (Ecom(50)) during collision-induced dissociation (CID) is dependent on the chemical structure of the ion as well as the physical and electrical characteristics of the collision cell. The current study was designed to identify variables influencing Ecom(50) values measured on four different mass spectrometers. METHODS: Fifteen test compounds were protonated using + ve electrospray ionization and the resulting ions were fragmented across a range of collision energies by CID. Survival yield versus collision energy curves were then used to calculate Ecom(50) values for each of these [M+H](+) ions on four different mass spectrometers. In addition, the relative recovery of the [M+H](+) ions of eight compounds ranging in molecular weight from 46 to 854 Da were determined at collision cell radiofrequency (RF) voltages ranging from 0 to 600 V. RESULTS: Ecom(50) values determined on the four instruments were highly correlated (r(2) values ranged from 0.953 to 0.992). Although these overall correlations were high, we found different maximum ion recoveries depending on collision cell RF voltage. High-mass ions had greater recovery at higher collision cell RF voltages, whereas low-mass ions had greater recovery at lower collision cell RF voltages as well as a broader range of ion recoveries. CONCLUSIONS: Ecom(50) values measured on four different instruments correlated surprisingly well given the differences in electrical and physical characteristics of the collision cells. However, our results suggest caution when comparing Ecom(50) values or CID spectra between instruments without correcting for the effects of RF voltage on ion transfer efficiency.


Asunto(s)
Espectrometría de Masa por Ionización de Electrospray/métodos , Espectrometría de Masa por Ionización de Electrospray/normas , Bencimidazoles/química , Iones/química , Modelos Lineales , Modelos Químicos , Peso Molecular , Estándares de Referencia
16.
J Chem Inf Model ; 52(5): 1222-37, 2012 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-22489687

RESUMEN

The goal of many metabolomic studies is to identify the molecular structure of endogenous molecules that are differentially expressed among sampled or treatment groups. The identified compounds can then be used to gain an understanding of disease mechanisms. Unfortunately, despite recent advances in a variety of analytical techniques, small molecule (<1000 Da) identification remains difficult. Rarely can a chemical structure be determined from experimental "features" such as retention time, exact mass, and collision induced dissociation spectra. Thus, without knowing structure, biological significance remains obscure. In this study, we explore an identification method in which the measured exact mass of an unknown is used to query available chemical databases to compile a list of candidate compounds. Predictions are made for the candidates using models of experimental features that have been measured for the unknown. The predicted values are used to filter the candidate list by eliminating compounds with predicted values substantially different from the unknown. The intent is to reduce the list of candidates to a reasonable number that can be obtained and measured for confirmation. To facilitate this exploration, we measured data and created models for two experimental features; MS Ecom50 (the energy in electronvolts required to fragment 50% of a selected precursor ion) and HPLC retention index. Using a data set of 52 compounds, Ecom50 models were developed based on both Molconn and CODESSA structural descriptors. These models gave r² values of 0.89 to 0.94 depending on the number of inputs, the modeling algorithm chosen, and whether neutral or protonated structures were used. The retention index model was developed with 400 compounds using a back-propagation artificial neural network and 33 Molconn structure descriptors. External validation gave a v² = 0.87 and standard error of 38 retention index units. As a test of the validity of the filtering approach, the Ecom50 and retention index models, along with exact mass and collision induced dissociation spectra matching, were used to identify 1,3-dicyclohexylurea in human plasma. This compound was not previously known to exist in human biofluids and its elemental formula was identical to 315 other candidate compounds downloaded from PubChem. These results suggest that the use of Ecom50 and retention index predictive models can improve nontargeted metabolite structure identification using HPLC/MS derived structural features.


Asunto(s)
Cromatografía Líquida de Alta Presión , Espectrometría de Masas , Metabolómica/métodos , Modelos Biológicos , Urea/análogos & derivados , Bases de Datos Factuales , Humanos , Urea/sangre , Urea/química
17.
J Am Soc Mass Spectrom ; 20(9): 1759-67, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19616966

RESUMEN

Survival yield analysis is routinely used in mass spectroscopy as a tool for assessing precursor ion stability and internal energy. Because ion internal energy and decomposition reaction rates are dependent on chemical structure, we reasoned that survival yield curves should be compound-specific and therefore useful for chemical identification. In this study, a quantitative approach for analyzing the correlation between survival yield and collision energy was developed and validated. This method is based on determining the collision energy (CE) at which the survival yield is 50% (CE(50)) and, further, provides slope and intercept values for each survival yield curve. In initial experiments using a defined set of homologous compounds, we found that CE(50) values were easily determined, quantitative, highly reproducible, and could discriminate between structural and even positional isomers. Further analysis demonstrated that CE(50) values were independent of cone potential and orthogonal to compound mass. Experimentally determined CE(50) values for a diverse set of 54 compounds were correlated to Molconn molecular structure descriptors. The resulting model yielded a statistically significant linear correlation between experimental and calculated CE(50) values and identified several structural characteristics related to precursor ion stability and fragmentation mechanism. Thus, the CE(50) is a promising method for compound identification and discrimination.


Asunto(s)
Algoritmos , Espectrometría de Masas/métodos , Modelos Químicos , Simulación por Computador , Peso Molecular
18.
J Chem Inf Model ; 49(4): 788-99, 2009 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19309176

RESUMEN

A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r(2) = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v(2) = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r(2) = 0.93, MAE = 30 RI units and validation v(2) = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.


Asunto(s)
Cromatografía Líquida de Alta Presión/estadística & datos numéricos , Redes Neurales de la Computación , Algoritmos , Inteligencia Artificial , Análisis por Conglomerados , Bases de Datos Factuales , Predicción , Modelos Lineales , Modelos Químicos , Relación Estructura-Actividad Cuantitativa , Reproducibilidad de los Resultados , Descriptores
19.
Bioanalysis ; 1(9): 1627-43, 2009 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21083108

RESUMEN

MS and HPLC are commonly used for compound characterization and obtaining structural information; in the field of metabonomics, these two analytical techniques are often combined to characterize unknown endogenous or exogenous metabolites present in complex biological samples. Since the structures of a majority of these metabolites are not actually identified, the result of most metabonomic studies is a list of m/z values and retention times. However, without knowing actual structures, the biological significance of these 'features' cannot be determined. The process of identifying the structures of unknown compounds can be time intensive, costly and frequently requires the use of multiple orthogonal analytical techniques - this laborious procedure seems insurmountable for the long lists of unknowns that must be identified for each study. In addition, the limited sample volume and the extremely low concentration of most endogenous analytes frequently make purification and identification by other instrumentation nearly impossible. This review is intended to explore the problems and progress with current tools that are available for MS-based structure identification for both endogenous and exogenous metabolites.


Asunto(s)
Líquidos Corporales/química , Líquidos Corporales/metabolismo , Bases de Datos Factuales , Espectrometría de Masas/métodos , Metabolómica/métodos , Humanos , Estructura Molecular
20.
Anal Chem ; 80(14): 5574-82, 2008 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-18547062

RESUMEN

Despite recent advances in NMR and mass spectrometry, the structural identification of organic compounds in complex biofluids remains a significant analytical challenge. For mass spectroscopy applications, chemical identification is generally limited to determination of elemental formula. Here we test the hypothesis that unknown chemical structures can be determined by matching their experimental collision-induced dissociation (CID) fragmentation spectra with computational fragmentation spectra of compounds retrieved from chemical databases. The monoisotopic molecular weights (MIMW +/- 10 ppm) of 102 "test" compounds were used to download 102 "bins" from the PubChem database. Each bin contained the corresponding test compound and, on average, 272 other candidate compounds, including 158 compounds having the same elemental formula as the test compound. Commercially available software was used to generate fragmentation spectra for all compounds in each of the 102 bins. Experimental CID spectra for each of the 102 test compounds were then compared to the computational spectra in order to rank candidate compounds based on number of fragment MIMW matches. This method returned the test compound as the highest ranking (or tied with the highest ranking) compound for 65 of the 102 bins. The test compound was ranked within the top 20 candidate compounds for 87 bins. In addition, the correct elemental formula was ranked first for 98 of 102 bins. Thus, matching experimental with computational fragmentation spectra is a valid method for rapidly discriminating among compounds having the same elemental formula and provides a novel approach for querying chemical databases for structural information.


Asunto(s)
Computadores , Bases de Datos Factuales , Espectrometría de Masas/métodos , Peso Molecular , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...