Pesquisa | Biblioteca Virtual em Saúde

1.

Combining Experimental with Computational Infrared and Mass Spectra for High-Throughput Nontargeted Chemical Structure Identification.

Karunaratne, Erandika; Hill, Dennis W; Dührkop, Kai; Böcker, Sebastian; Grant, David F.

Anal Chem ; 95(32): 11901-11907, 2023 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-37540774

RESUMO

The inability to identify the structures of most metabolites detected in environmental or biological samples limits the utility of nontargeted metabolomics. The most widely used analytical approaches combine mass spectrometry and machine learning methods to rank candidate structures contained in large chemical databases. Given the large chemical space typically searched, the use of additional orthogonal data may improve the identification rates and reliability. Here, we present results of combining experimental and computational mass and IR spectral data for high-throughput nontargeted chemical structure identification. Experimental MS/MS and gas-phase IR data for 148 test compounds were obtained from NIST. Candidate structures for each of the test compounds were obtained from PubChem (mean = 4444 candidate structures per test compound). Our workflow used CSI:FingerID to initially score and rank the candidate structures. The top 1000 ranked candidates were subsequently used for IR spectra prediction, scoring, and ranking using density functional theory (DFT-IR). Final ranking of the candidates was based on a composite score calculated as the average of the CSI:FingerID and DFT-IR rankings. This approach resulted in the correct identification of 88 of the 148 test compounds (59%). 129 of the 148 test compounds (87%) were ranked within the top 20 candidates. These identification rates are the highest yet reported when candidate structures are used from PubChem. Combining experimental and computational MS/MS and IR spectral data is a potentially powerful option for prioritizing candidates for final structure verification.

Assuntos

Bases de Dados de Compostos Químicos , Espectrometria de Massas em Tandem , Reprodutibilidade dos Testes , Metabolômica/métodos , Aprendizado de Máquina

2.

High-Throughput Non-targeted Chemical Structure Identification Using Gas-Phase Infrared Spectra.

Karunaratne, Erandika; Hill, Dennis W; Pracht, Philipp; Gascón, José A; Grimme, Stefan; Grant, David F.

Anal Chem ; 93(30): 10688-10696, 2021 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-34288660

RESUMO

The high-throughput identification of unknown metabolites in biological samples remains challenging. Most current non-targeted metabolomics studies rely on mass spectrometry, followed by computational methods that rank thousands of candidate structures based on how closely their predicted mass spectra match the experimental mass spectrum of an unknown. We reasoned that the infrared (IR) spectra could be used in an analogous manner and could add orthologous structure discrimination; however, this has never been evaluated on large data sets. Here, we present results of a high-throughput computational method for predicting IR spectra of candidate compounds obtained from the PubChem database. Predicted spectra were ranked based on their similarity to gas-phase experimental IR spectra of test compounds obtained from the NIST. Our computational workflow (IRdentify) consists of a fast semiempirical quantum mechanical method for initial IR spectra prediction, ranking, and triaging, followed by a final IR spectra prediction and ranking using density functional theory. This approach resulted in the correct identification of 47% of 258 test compounds. On average, there were 2152 candidate structures evaluated for each test compound, giving a total of approximately 555,200 candidate structures evaluated. We discuss several variables that influenced the identification accuracy and then demonstrate the potential application of this approach in three areas: (1) combining IR and mass spectra rankings into a single composite rank score, (2) identifying the precursor and fragment ions using cryogenic ion vibrational spectroscopy, and (3) the incorporation of a trimethylsilyl derivatization step to extend the method compatibility to less-volatile compounds. Overall, our results suggest that matching computational with experimental IR spectra is a potentially powerful orthogonal option for adding significant high-throughput chemical structure discrimination when used with other non-targeted chemical structure identification methods.

Assuntos

Metabolômica , Bases de Dados Factuais , Íons , Espectrometria de Massas

3.

Comprehensive Assessment of GFN Tight-Binding and Composite Density Functional Theory Methods for Calculating Gas-Phase Infrared Spectra.

Pracht, Philipp; Grant, David F; Grimme, Stefan.

J Chem Theory Comput ; 16(11): 7044-7060, 2020 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-33054183

RESUMO

Vibrational spectroscopy is a valuable and widely used analytical tool for the characterization of chemical substances. We investigate the performance of semiempirical quantum mechanical GFN tight-binding and force-field methods for the calculation of gas-phase infrared spectra in comparison to experiment and low-cost (B3LYP-3c) density functional theory. A data set of 7247 experimental references was used to evaluate method performance based on automatic spectra comparison. Various quantitative spectral similarity measures were employed for the comparison between theory and experiment and for determining empirical scaling factors. It is shown that the scaling of atomic masses provides an accurate yet simple alternative to standard global frequency scaling in density functional theory (DFT) and semiempirical calculations. Furthermore, the method performance for 58 exemplary transition metal complexes was investigated. The efficient DFT composite method B3LYP-3c, that was introduced in the course of this work, was found to be excellently suited for general IR spectra calculations. The GFN1- and GFN2-xTB tight-binding methods clearly outperformed the PMx competitors. Conformational changes were investigated for a subset of the data and are found to have a mediocre strong influence on the simulated spectra suggesting that the corresponding elaborate sampling steps may be neglected in automated compound identification workflows.

4.

MolFind²: A Protocol for Acquiring and Integrating MS³ Data to Improve In Silico Chemical Structure Elucidation for Metabolomics.

Samaraweera, Milinda A; Hill, Dennis W; Grant, David F.

Methods Mol Biol ; 2084: 283-295, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31729668

RESUMO

Structure elucidation of metabolites (<1000 Da) in biofluids is extremely challenging due to the diversity and complexity of chemical structure space. Generally, due to lack of reference tandem mass data (MS2), in silico fragmenters are used to rank candidates acquired from chemical databases as a function on how well they explain an experimental collision-induced dissociation spectrum. However, multistage fragmentation data (i.e., MS3) have not been adequately utilized in current metabolomics structure elucidation pipelines. To address this shortcoming, here we describe an experimental (nontargeted direct infusion ion mobility-mass spectrometry-based) and computational workflow to acquire and utilize multistage mass (MS3) spectrometry data for database-assisted structure elucidation.

Assuntos

Biologia Computacional , Metabolômica/métodos , Software , Espectrometria de Massas em Tandem , Biologia Computacional/métodos , Bases de Dados de Compostos Químicos , Estrutura Molecular , Espectrometria de Massas em Tandem/métodos

5.

Evaluation of an Artificial Neural Network Retention Index Model for Chemical Structure Identification in Nontargeted Metabolomics.

Samaraweera, Milinda A; Hall, L Mark; Hill, Dennis W; Grant, David F.

Anal Chem ; 90(21): 12752-12760, 2018 11 06.

Artigo em Inglês | MEDLINE | ID: mdl-30350614

RESUMO

Liquid chromatography coupled with electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) is a major analytical technique used for nontargeted identification of metabolites in biological fluids. Typically, in LC-ESI-MS/MS based database assisted structure elucidation pipelines, the exact mass of an unknown compound is used to mine a chemical structure database to acquire an initial set of possible candidates. Subsequent matching of the collision induced dissociation (CID) spectrum of the unknown to the CID spectra of candidate structures facilitates identification. However, this approach often fails because of the large numbers of potential candidates (i.e., false positives) for which CID spectra are not available. To overcome this problem, CID fragmentation predication programs have been developed, but these also have limited success if large numbers of isomers with similar CID spectra are present in the candidate set. In this study, we investigated the use of a retention index (RI) predictive model as an orthogonal method to help improve identification rates. The model was used to eliminate candidate structures whose predicted RI values differed significantly from the experimentally determined RI value of the unknown compound. We tested this approach using a set of ninety-one endogenous metabolites and four in silico CID fragmentation algorithms: CFM-ID, CSI:FingerID, Mass Frontier, and MetFrag. Candidate sets obtained from PubChem and the Human Metabolite Database (HMDB) were ranked with and without RI filtering followed by in silico spectral matching. Upon RI filtering, 12 of the ninety-one metabolites were eliminated from their respective candidate sets, i.e., were scored incorrectly as negatives. For the remaining seventy-nine compounds, we show that RI filtering eliminated an average of 58% from PubChem candidate sets. This resulted in an approximately 2-fold improvement in average rankings when using CFM-ID, Mass Frontier, and MetFrag. In addition, RI filtering slightly increased the occurrence of number one rankings for all 4 fragmentation algorithms. However, RI filtering did not significantly improve average rankings when HMDB was used as the candidate database, nor did it significantly improve average rankings when using CSI:FingerID. Overall, we show that the current RI model incorrectly eliminated more true positives (12) than were expected (4-5) on the basis of the filtering method. However, it slightly improved the number of correct first place rankings and improved overall average rankings when using CFM-ID, Mass Frontier, and MetFrag.

Assuntos

Bases de Dados de Compostos Químicos/estatística & dados numéricos , Metabolômica/métodos , Modelos Químicos , Redes Neurais de Computação , Algoritmos , Cromatografia Líquida , Simulação por Computador , Estrutura Molecular , Espectrometria de Massas por Ionização por Electrospray

6.

Development of a Reverse Phase HPLC Retention Index Model for Nontargeted Metabolomics Using Synthetic Compounds.

Hall, L Mark; Hill, Dennis W; Bugden, Kelly; Cawley, Shannon; Hall, Lowell H; Chen, Ming-Hui; Grant, David F.

J Chem Inf Model ; 58(3): 591-604, 2018 03 26.

Artigo em Inglês | MEDLINE | ID: mdl-29489351

RESUMO

The MolFind application has been developed as a nontargeted metabolomics chemometric tool to facilitate structure identification when HPLC biofluids analysis reveals a feature of interest. Here synthetic compounds are selected and measured to form the basis of a new, more accurate, HPLC retention index model for use with MolFind. We show that relatively inexpensive synthetic screening compounds with simple structures can be used to develop an artificial neural network model that is successful in making quality predictions for human metabolites. A total of 1955 compounds were obtained and measured for the model. A separate set of 202 human metabolites was used for independent validation. The new ANN model showed improved accuracy over previous models. The model, based on relatively simple compounds, was able to make quality predictions for complex compounds not similar to training data. Independent validation metabolites with feature combinations found in three or more training compounds were predicted with 97% sensitivity while metabolites with feature combinations found in less than three training compounds were predicted with >90% sensitivity. The study describes the method used to select synthetic compounds and new descriptors developed to encode the relationship between lipophilic molecular subgraphs and HPLC retention. Finally, we introduce the QRI (qualitative range of interest) modification of neural network backpropagation learning to generate models simultaneously based on quantitative and qualitative data.

Assuntos

Cromatografia Líquida de Alta Pressão/métodos , Cromatografia de Fase Reversa/métodos , Metabolômica/métodos , Humanos , Metaboloma , Redes Neurais de Computação

7.

Yale school of public health symposium on lifetime exposures and human health: the exposome; summary and future reflections.

Johnson, Caroline H; Athersuch, Toby J; Collman, Gwen W; Dhungana, Suraj; Grant, David F; Jones, Dean P; Patel, Chirag J; Vasiliou, Vasilis.

Hum Genomics ; 11(1): 32, 2017 Dec 08.

Artigo em Inglês | MEDLINE | ID: mdl-29221465

RESUMO

The exposome is defined as "the totality of environmental exposures encountered from birth to death" and was developed to address the need for comprehensive environmental exposure assessment to better understand disease etiology. Due to the complexity of the exposome, significant efforts have been made to develop technologies for longitudinal, internal and external exposure monitoring, and bioinformatics to integrate and analyze datasets generated. Our objectives were to bring together leaders in the field of exposomics, at a recent Symposium on "Lifetime Exposures and Human Health: The Exposome," held at Yale School of Public Health. Our aim was to highlight the most recent technological advancements for measurement of the exposome, bioinformatics development, current limitations, and future needs in environmental health. In the discussions, an emphasis was placed on moving away from a one-chemical one-health outcome model toward a new paradigm of monitoring the totality of exposures that individuals may experience over their lifetime. This is critical to better understand the underlying biological impact on human health, particularly during windows of susceptibility. Recent advancements in metabolomics and bioinformatics are driving the field forward in biomonitoring and understanding the biological impact, and the technological and logistical challenges involved in the analyses were highlighted. In conclusion, further developments and support are needed for large-scale biomonitoring and management of big data, standardization for exposure and data analyses, bioinformatics tools for co-exposure or mixture analyses, and methods for data sharing.

Assuntos

Exposição Ambiental , Saúde Ambiental , Monitoramento Ambiental/métodos , Humanos , Saúde Pública , Sociedades Científicas

8.

Development of Database Assisted Structure Identification (DASI) Methods for Nontargeted Metabolomics.

Menikarachchi, Lochana C; Dubey, Ritvik; Hill, Dennis W; Brush, Daniel N; Grant, David F.

Metabolites ; 6(2)2016 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-27258318

RESUMO

Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these cases, constrained structure generation with combinatorial structure generators provides a potential option. Here we evaluated structure generation constraints based on the specification of: (1) substructures required (i.e., seed structures); (2) substructures not allowed; and (3) filters to remove incorrect structures. Our approach (database assisted structure identification, DASI) used predictive models in MolFind to find candidate structures with chemical and physical properties similar to the unknown. These candidates were then used for seed structure generation using eight different structure generation algorithms. One algorithm was able to generate correct seed structures for 21/39 test compounds. Eleven of these seed structures were large enough to constrain the combinatorial structure generator to fewer than 100,000 structures. In 35/39 cases, at least one algorithm was able to generate a correct seed structure. The DASI method has several limitations and will require further experimental validation and optimization. At present, it seems most useful for identifying the structure of unknown-unknowns with molecular weights <200 Da.

9.

Optimizing artificial neural network models for metabolomics and systems biology: an example using HPLC retention index data.

Hall, L Mark; Hill, Dennis W; Menikarachchi, Lochana C; Chen, Ming-Hui; Hall, Lowell H; Grant, David F.

Bioanalysis ; 7(8): 939-55, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25966007

RESUMO

BACKGROUND: Artificial Neural Networks (ANN) are extensively used to model 'omics' data. Different modeling methodologies and combinations of adjustable parameters influence model performance and complicate model optimization. METHODOLOGY: We evaluated optimization of four ANN modeling parameters (learning rate annealing, stopping criteria, data split method, network architecture) using retention index (RI) data for 390 compounds. Models were assessed by independent validation (I-Val) using newly measured RI values for 1492 compounds. CONCLUSION: The best model demonstrated an I-Val standard error of 55 RI units and was built using a Ward's clustering data split and a minimally nonlinear network architecture. Use of validation statistics for stopping and final model selection resulted in better independent validation performance than the use of test set statistics.

Assuntos

Inteligência Artificial/normas , Cromatografia Líquida de Alta Pressão/métodos , Metabolômica , Redes Neurais de Computação , Biologia de Sistemas/normas , Análise por Conglomerados , Bases de Dados Factuais , Espectrometria de Massas em Tandem/métodos

10.

Correction of precursor and product ion relative abundances in order to standardize CID spectra and improve Ecom₅₀ accuracy for non-targeted metabolomics.

Dubey, Ritvik; Hill, Dennis W; Lai, Steven; Ming-Hui, Chen; Grant, David F.

Metabolomics ; 11(3): 753-763, 2015 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-25960696

RESUMO

Quantitative biases in the abundance of precursor and product ions due to mass discrimination in RF-only ion guides results in inaccurate collision induced dissociation (CID) spectra. We evaluated the effects of collision cell RF voltage and collision energy on CID spectra using ten singly protonated compounds (46-854 Da) in an orthogonal acceleration time-of-flight mass spectrometer. The relative ion transfer efficiency, i.e. the relative amount of ions transferred through the ion guide at any particular RF voltage was shown to be dependent on the ion's m/z. We developed an algorithm to correct for the mass discriminating effects of RF voltage on CID spectra. The algorithm was tested for both precursor and product ions at multiple RF voltages and collision energies in order to ensure reliability. Our results suggest that compounds that generate major product ions with m/z values <150 have peak intensities that deviate substantially from their actual abundance. This has implications for small molecule metabolomics research, particularly for studies that rely on CID spectra matching methods for structure identification.

11.

Metabolic pathway predictions for metabolomics: a molecular structure matching approach.

Hamdalla, Mai A; Rajasekaran, Sanguthevar; Grant, David F; Mandoiu, Ion I.

J Chem Inf Model ; 55(3): 709-18, 2015 Mar 23.

Artigo em Inglês | MEDLINE | ID: mdl-25668446

RESUMO

Metabolic pathways are composed of a series of chemical reactions occurring within a cell. In each pathway, enzymes catalyze the conversion of substrates into structurally similar products. Thus, structural similarity provides a potential means for mapping newly identified biochemical compounds to known metabolic pathways. In this paper, we present TrackSM, a cheminformatics tool designed to associate a chemical compound to a known metabolic pathway based on molecular structure matching techniques. Validation experiments show that TrackSM is capable of associating 93% of tested structures to their correct KEGG pathway class and 88% to their correct individual KEGG pathway. This suggests that TrackSM may be a valuable tool to aid in associating previously unknown small molecules to known biochemical pathways and improve our ability to link metabolomics, proteomic, and genomic data sets. TrackSM is freely available at http://metabolomics.pharm.uconn.edu/?q=Software.html .

Assuntos

Algoritmos , Redes e Vias Metabólicas , Metabolômica/métodos , Estrutura Molecular , Reprodutibilidade dos Testes , Software

12.

Ion mobility-derived collision cross section as an additional measure for lipid fingerprinting and identification.

Paglia, Giuseppe; Angel, Peggi; Williams, Jonathan P; Richardson, Keith; Olivos, Hernando J; Thompson, J Will; Menikarachchi, Lochana; Lai, Steven; Walsh, Callee; Moseley, Arthur; Plumb, Robert S; Grant, David F; Palsson, Bernhard O; Langridge, James; Geromanos, Scott; Astarita, Giuseppe.

Anal Chem ; 87(2): 1137-44, 2015 Jan 20.

Artigo em Inglês | MEDLINE | ID: mdl-25495617

RESUMO

Despite recent advances in analytical and computational chemistry, lipid identification remains a significant challenge in lipidomics. Ion-mobility spectrometry provides an accurate measure of the molecules' rotationally averaged collision cross-section (CCS) in the gas phase and is thus related to ionic shape. Here, we investigate the use of CCS as a highly specific molecular descriptor for identifying lipids in biological samples. Using traveling wave ion mobility mass spectrometry (MS), we measured the CCS values of over 200 lipids within multiple chemical classes. CCS values derived from ion mobility were not affected by instrument settings or chromatographic conditions, and they were highly reproducible on instruments located in independent laboratories (interlaboratory RSD < 3% for 98% of molecules). CCS values were used as additional molecular descriptors to identify brain lipids using a variety of traditional lipidomic approaches. The addition of CCS improved the reproducibility of analysis in a liquid chromatography-MS workflow and maximized the separation of isobaric species and the signal-to-noise ratio in direct-MS analyses (e.g., "shotgun" lipidomics and MS imaging). These results indicate that adding CCS to databases and lipidomics workflows increases the specificity and selectivity of analysis, thus improving the confidence in lipid identification compared to traditional analytical approaches. The CCS/accurate-mass database described here is made publicly available.

Assuntos

Encéfalo/metabolismo , Lipídeos/análise , Espectrometria de Massa de Íon Secundário/métodos , Idoso , Cromatografia Líquida , Humanos , Razão Sinal-Ruído

13.

Development of a two-step indirect method for modeling Ecom50.

Hall, Lowell H; Hall, L Mark; Hill, Dennis W; Hawkins, Douglas M; Chen, Ming-Hui; Grant, David F.

Curr Comput Aided Drug Des ; 10(4): 374-82, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25549758

RESUMO

A novel approach is developed for modeling situations in which the modeled property is an algebraically transformed version of the original experimental data. In many cases such a transformation results in a data set with a significantly smaller data range. Here we explore the effects of range-of-data on modeling statistics. We illustrate a twostep method using data on the mass spectrometry collision energy (CE) that is required to decompose 50% of precursor ions to fragments (CE50). Earlier we showed that a nonlinear center-of-mass transformation, yielding Ecom50, produces values less dependent on the specific mass spectrometric experimental conditions. For this data set the Ecom50 range is 13.5% of the CE50 range. We propose a two-step modeling method. First, the original experimental data, CE50, (larger range-of-data) is modeled by a standard modeling method (PLS). Second, the calculated dependent variable resulting from the modeling is algebraically transformed (not modeled) according to the center-of-mass transformation, providing the generally more useful data, Ecom50. As shown here, use of this two-step method for predicting Ecom50 (from previously published data) produces a standard error 21% smaller and correspondingly reduces the confidence interval for prediction. Some specific implications for prediction are given for a published data set. This work is part of the ongoing development of a system of models to assist in the development of human metabolites.

Assuntos

Modelos Estatísticos , Humanos , Espectrometria de Massas

14.

Development of HPLC Retention Index QSAR Models for Nontargeted Metabolomics.

Hall, L Mark; Hill, Dennis W; Hall, Lowell H; Kormos, Tzipporah M; Grant, David F.

Adv Chromatogr ; 51: 241-79, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-26462375

Assuntos

Cromatografia Líquida de Alta Pressão/métodos , Metabolômica , Modelos Teóricos , Relação Quantitativa Estrutura-Atividade

15.

In silico enzymatic synthesis of a 400,000 compound biochemical database for nontargeted metabolomics.

Menikarachchi, Lochana C; Hill, Dennis W; Hamdalla, Mai A; Mandoiu, Ion I; Grant, David F.

J Chem Inf Model ; 53(9): 2483-92, 2013 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-23991755

RESUMO

Current methods of structure identification in mass-spectrometry-based nontargeted metabolomics rely on matching experimentally determined features of an unknown compound to those of candidate compounds contained in biochemical databases. A major limitation of this approach is the relatively small number of compounds currently included in these databases. If the correct structure is not present in a database, it cannot be identified, and if it cannot be identified, it cannot be included in a database. Thus, there is an urgent need to augment metabolomics databases with rationally designed biochemical structures using alternative means. Here we present the In Vivo/In Silico Metabolites Database (IIMDB), a database of in silico enzymatically synthesized metabolites, to partially address this problem. The database, which is available at http://metabolomics.pharm.uconn.edu/iimdb/, includes ~23,000 known compounds (mammalian metabolites, drugs, secondary plant metabolites, and glycerophospholipids) collected from existing biochemical databases plus more than 400,000 computationally generated human phase-I and phase-II metabolites of these known compounds. IIMDB features a user-friendly web interface and a programmer-friendly RESTful web service. Ninety-five percent of the computationally generated metabolites in IIMDB were not found in any existing database. However, 21,640 were identical to compounds already listed in PubChem, HMDB, KEGG, or HumanCyc. Furthermore, the vast majority of these in silico metabolites were scored as biological using BioSM, a software program that identifies biochemical structures in chemical structure space. These results suggest that in silico biochemical synthesis represents a viable approach for significantly augmenting biochemical databases for nontargeted metabolomics applications.

Assuntos

Bases de Dados Factuais , Enzimas/metabolismo , Metabolômica/métodos , Animais , Glicerofosfolipídeos/metabolismo , Humanos , Internet , Preparações Farmacêuticas/metabolismo , Plantas/metabolismo , Interface Usuário-Computador

16.

BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space.

Hamdalla, Mai A; Mandoiu, Ion I; Hill, Dennis W; Rajasekaran, Sanguthevar; Grant, David F.

J Chem Inf Model ; 53(3): 601-12, 2013 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-23330685

RESUMO

The structural identification of unknown biochemical compounds in complex biofluids continues to be a major challenge in metabolomics research. Using LC/MS, there are currently two major options for solving this problem: searching small biochemical databases, which often do not contain the unknown of interest or searching large chemical databases which include large numbers of nonbiochemical compounds. Searching larger chemical databases (larger chemical space) increases the odds of identifying an unknown biochemical compound, but only if nonbiochemical structures can be eliminated from consideration. In this paper we present BioSM; a cheminformatics tool that uses known endogenous mammalian biochemical compounds (as scaffolds) and graph matching methods to identify endogenous mammalian biochemical structures in chemical structure space. The results of a comprehensive set of empirical experiments suggest that BioSM identifies endogenous mammalian biochemical structures with high accuracy. In a leave-one-out cross validation experiment, BioSM correctly predicted 95% of 1388 Kyoto Encyclopedia of Genes and Genomes (KEGG) compounds as endogenous mammalian biochemicals using 1565 scaffolds. Analysis of two additional biological data sets containing 2330 human metabolites (HMDB) and 2416 plant secondary metabolites (KEGG) resulted in biochemical annotations of 89% and 72% of the compounds, respectively. When a data set of 3895 drugs (DrugBank and USAN) was tested, 48% of these structures were predicted to be biochemical. However, when a set of synthetic chemical compounds (Chembridge and Chemsynthesis databases) were examined, only 29% of the 458,207 structures were predicted to be biochemical. Moreover, BioSM predicted that 34% of 883,199 randomly selected compounds from PubChem were biochemical. We then expanded the scaffold list to 3927 biochemical compounds and reevaluated the above data sets to determine whether scaffold number influenced model performance. Although there were significant improvements in model sensitivity and specificity using the larger scaffold list, the data set comparison results were very similar. These results suggest that additional biochemical scaffolds will not further improve our representation of biochemical structure space and that the model is reasonably robust. BioSM provides a qualitative (yes/no) and quantitative (ranking) method for endogenous mammalian biochemical annotation of chemical space and, thus, will be useful in the identification of unknown biochemical structures in metabolomics. BioSM is freely available at http://metabolomics.pharm.uconn.edu.

Assuntos

Mamíferos/metabolismo , Metabolômica/métodos , Algoritmos , Animais , Inteligência Artificial , Líquidos Corporais/química , Citocromos , Bases de Dados de Proteínas , Humanos , Modelos Químicos , Modelos Moleculares , Reprodutibilidade dos Testes , Bibliotecas de Moléculas Pequenas

17.

Chemical structure identification in metabolomics: computational modeling of experimental features.

Menikarachchi, Lochana C; Hamdalla, Mai A; Hill, Dennis W; Grant, David F.

Comput Struct Biotechnol J ; 5: e201302005, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24688698

RESUMO

The identification of compounds in complex mixtures remains challenging despite recent advances in analytical techniques. At present, no single method can detect and quantify the vast array of compounds that might be of potential interest in metabolomics studies. High performance liquid chromatography/mass spectrometry (HPLC/MS) is often considered the analytical method of choice for analysis of biofluids. The positive identification of an unknown involves matching at least two orthogonal HPLC/MS measurements (exact mass, retention index, drift time etc.) against an authentic standard. However, due to the limited availability of authentic standards, an alternative approach involves matching known and measured features of the unknown compound with computationally predicted features for a set of candidate compounds downloaded from a chemical database. Computationally predicted features include retention index, ECOM50 (energy required to decompose 50% of a selected precursor ion in a collision induced dissociation cell), drift time, whether the unknown compound is biological or synthetic and a collision induced dissociation (CID) spectrum. Computational predictions are used to filter the initial "bin" of candidate compounds. The final output is a ranked list of candidates that best match the known and measured features. In this mini review, we discuss cheminformatics methods underlying this database search-filter identification approach.

18.

MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures.

Menikarachchi, Lochana C; Cawley, Shannon; Hill, Dennis W; Hall, L Mark; Hall, Lowell; Lai, Steven; Wilder, Janine; Grant, David F.

Anal Chem ; 84(21): 9388-94, 2012 Nov 06.

Artigo em Inglês | MEDLINE | ID: mdl-23039714

RESUMO

In this paper, we present MolFind, a highly multithreaded pipeline type software package for use as an aid in identifying chemical structures in complex biofluids and mixtures. MolFind is specifically designed for high-performance liquid chromatography/mass spectrometry (HPLC/MS) data inputs typical of metabolomics studies where structure identification is the ultimate goal. MolFind enables compound identification by matching HPLC/MS-based experimental data obtained for an unknown compound with computationally derived HPLC/MS values for candidate compounds downloaded from chemical databases such as PubChem. The downloaded "bins" consist of all compounds matching the monoisotopic molecular weight of the unknown. The computational HPLC/MS values predicted include retention index (RI), ECOM(50) (energy required to fragment 50% of a selected precursor ion), drift time, and collision induced dissociation (CID) spectrum. RI, ECOM(50), and drift-time models are used for filtering compounds downloaded from PubChem. The remaining candidates are then ranked based on CID spectra matching. Current RI and ECOM(50) models allow for the removal of about 28% of compounds from PubChem bins. Our estimates suggest that this could be improved to as much as 87% with additional chemical structures included in the computational models. Quantitative structure property relationship-based modeling of drift times showed a better correlation with experimentally determined drift times than did Mobcal cross-sectional areas. In 23 of 35 example cases, filtering PubChem bins with RI and ECOM(50) predictive models resulted in improved ranking of the unknown compounds compared to previous studies using CID spectra matching alone. In 19 of 35 examples, the correct candidate was ranked within the top 20 compounds in bins containing an average of 1635 compounds.

Assuntos

Cromatografia Líquida de Alta Pressão/métodos , Espectrometria de Massas/métodos , Software

19.

Correlation of Ecom50 values between mass spectrometers: effect of collision cell radiofrequency voltage on calculated survival yield.

Hill, Dennis W; Baveghems, Clive L; Albaugh, Daniel R; Kormos, Tzipporah M; Lai, Steven; Ng, Hank K; Grant, David F.

Rapid Commun Mass Spectrom ; 26(19): 2303-10, 2012 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-22956322

RESUMO

RATIONALE: The determination of the center-of-mass energy at which 50% of a precursor ion decomposes (Ecom(50)) during collision-induced dissociation (CID) is dependent on the chemical structure of the ion as well as the physical and electrical characteristics of the collision cell. The current study was designed to identify variables influencing Ecom(50) values measured on four different mass spectrometers. METHODS: Fifteen test compounds were protonated using + ve electrospray ionization and the resulting ions were fragmented across a range of collision energies by CID. Survival yield versus collision energy curves were then used to calculate Ecom(50) values for each of these [M+H](+) ions on four different mass spectrometers. In addition, the relative recovery of the [M+H](+) ions of eight compounds ranging in molecular weight from 46 to 854 Da were determined at collision cell radiofrequency (RF) voltages ranging from 0 to 600 V. RESULTS: Ecom(50) values determined on the four instruments were highly correlated (r(2) values ranged from 0.953 to 0.992). Although these overall correlations were high, we found different maximum ion recoveries depending on collision cell RF voltage. High-mass ions had greater recovery at higher collision cell RF voltages, whereas low-mass ions had greater recovery at lower collision cell RF voltages as well as a broader range of ion recoveries. CONCLUSIONS: Ecom(50) values measured on four different instruments correlated surprisingly well given the differences in electrical and physical characteristics of the collision cells. However, our results suggest caution when comparing Ecom(50) values or CID spectra between instruments without correcting for the effects of RF voltage on ion transfer efficiency.

Assuntos

Espectrometria de Massas por Ionização por Electrospray/métodos , Espectrometria de Massas por Ionização por Electrospray/normas , Benzimidazóis/química , Íons/química , Modelos Lineares , Modelos Químicos , Peso Molecular , Padrões de Referência

20.

Development of Ecom50 and retention index models for nontargeted metabolomics: identification of 1,3-dicyclohexylurea in human serum by HPLC/mass spectrometry.

Hall, L Mark; Hall, Lowell H; Kertesz, Tzipporah M; Hill, Dennis W; Sharp, Thomas R; Oblak, Edward Z; Dong, Ying W; Wishart, David S; Chen, Ming-Hui; Grant, David F.

J Chem Inf Model ; 52(5): 1222-37, 2012 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-22489687

RESUMO

The goal of many metabolomic studies is to identify the molecular structure of endogenous molecules that are differentially expressed among sampled or treatment groups. The identified compounds can then be used to gain an understanding of disease mechanisms. Unfortunately, despite recent advances in a variety of analytical techniques, small molecule (<1000 Da) identification remains difficult. Rarely can a chemical structure be determined from experimental "features" such as retention time, exact mass, and collision induced dissociation spectra. Thus, without knowing structure, biological significance remains obscure. In this study, we explore an identification method in which the measured exact mass of an unknown is used to query available chemical databases to compile a list of candidate compounds. Predictions are made for the candidates using models of experimental features that have been measured for the unknown. The predicted values are used to filter the candidate list by eliminating compounds with predicted values substantially different from the unknown. The intent is to reduce the list of candidates to a reasonable number that can be obtained and measured for confirmation. To facilitate this exploration, we measured data and created models for two experimental features; MS Ecom50 (the energy in electronvolts required to fragment 50% of a selected precursor ion) and HPLC retention index. Using a data set of 52 compounds, Ecom50 models were developed based on both Molconn and CODESSA structural descriptors. These models gave r² values of 0.89 to 0.94 depending on the number of inputs, the modeling algorithm chosen, and whether neutral or protonated structures were used. The retention index model was developed with 400 compounds using a back-propagation artificial neural network and 33 Molconn structure descriptors. External validation gave a v² = 0.87 and standard error of 38 retention index units. As a test of the validity of the filtering approach, the Ecom50 and retention index models, along with exact mass and collision induced dissociation spectra matching, were used to identify 1,3-dicyclohexylurea in human plasma. This compound was not previously known to exist in human biofluids and its elemental formula was identical to 315 other candidate compounds downloaded from PubChem. These results suggest that the use of Ecom50 and retention index predictive models can improve nontargeted metabolite structure identification using HPLC/MS derived structural features.

Assuntos

Cromatografia Líquida de Alta Pressão , Espectrometria de Massas , Metabolômica/métodos , Modelos Biológicos , Ureia/análogos & derivados , Bases de Dados Factuais , Humanos , Ureia/sangue , Ureia/química

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA