Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Anal Bioanal Chem ; 416(10): 2565-2579, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38530399

RESUMEN

Mass-spectrometry-based non-targeted analysis (NTA), in which mass spectrometric signals are assigned chemical identities based on a systematic collation of evidence, is a growing area of interest for toxicological risk assessment. Successful NTA results in better identification of potentially hazardous pollutants within the environment, facilitating the development of targeted analytical strategies to best characterize risks to human and ecological health. A supporting component of the NTA process involves assessing whether suspected chemicals are amenable to the mass spectrometric method, which is necessary in order to assign an observed signal to the chemical structure. Prior work from this group involved the development of a random forest model for predicting the amenability of 5517 unique chemical structures to liquid chromatography-mass spectrometry (LC-MS). This work improves the interpretability of the group's prior model of the same endpoint, as well as integrating 1348 more data points across negative and positive ionization modes. We enhance interpretability by feature engineering, a machine learning practice that reduces the input dimensionality while attempting to preserve performance statistics. We emphasize the importance of interpretable machine learning models within the context of building confidence in NTA identification. The novel data were curated by the labeling of compounds as amenable or unamenable by expert curators, resulting in an enhanced set of chemical compounds to expand the applicability domain of the prior model. The balanced accuracy benchmark of the newly developed model is comparable to performance previously reported (mean CV BA is 0.84 vs. 0.82 in positive mode, and 0.85 vs. 0.82 in negative mode), while on a novel external set, derived from this work's data, the Matthews correlation coefficients (MCC) for the novel models are 0.66 and 0.68 for positive and negative mode, respectively. Our group's prior published models scored MCC of 0.55 and 0.54 on the same external sets. This demonstrates appreciable improvement over the chemical space captured by the expanded dataset. This work forms part of our ongoing efforts to develop models with higher interpretability and higher performance to support NTA efforts.

2.
Chem Res Toxicol ; 36(3): 465-478, 2023 03 20.
Artículo en Inglés | MEDLINE | ID: mdl-36877669

RESUMEN

The need for careful assembly, training, and validation of quantitative structure-activity/property models (QSAR/QSPR) is more significant than ever as data sets become larger and sophisticated machine learning tools become increasingly ubiquitous and accessible to the scientific community. Regulatory agencies such as the United States Environmental Protection Agency must carefully scrutinize each aspect of a resulting QSAR/QSPR model to determine its potential use in environmental exposure and hazard assessment. Herein, we revisit the goals of the Organisation for Economic Cooperation and Development (OECD) in our application and discuss the validation principles for structure-activity models. We apply these principles to a model for predicting water solubility of organic compounds derived using random forest regression, a common machine learning approach in the QSA/PR literature. Using public sources, we carefully assembled and curated a data set consisting of 10,200 unique chemical structures with associated water solubility measurements. This data set was then used as a focal narrative to methodically consider the OECD's QSA/PR principles and how they can be applied to random forests. Despite some expert, mechanistically informed supervision of descriptor selection to enhance model interpretability, we achieved a model of water solubility with comparable performance to previously published models (5-fold cross validated performance 0.81 R2 and 0.98 RMSE). We hope this work will catalyze a necessary conversation around the importance of cautiously modernizing and explicitly leveraging OECD principles while pursuing state-of-the-art machine learning approaches to derive QSA/PR models suitable for regulatory consideration.


Asunto(s)
Organización para la Cooperación y el Desarrollo Económico , Relación Estructura-Actividad Cuantitativa , Solubilidad , Algoritmos , Agua/química
3.
Anal Bioanal Chem ; 414(17): 4919-4933, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35699740

RESUMEN

Non-targeted analysis (NTA) methods are widely used for chemical discovery but seldom employed for quantitation due to a lack of robust methods to estimate chemical concentrations with confidence limits. Herein, we present and evaluate new statistical methods for quantitative NTA (qNTA) using high-resolution mass spectrometry (HRMS) data from EPA's Non-Targeted Analysis Collaborative Trial (ENTACT). Experimental intensities of ENTACT analytes were observed at multiple concentrations using a semi-automated NTA workflow. Chemical concentrations and corresponding confidence limits were first estimated using traditional calibration curves. Two qNTA estimation methods were then implemented using experimental response factor (RF) data (where RF = intensity/concentration). The bounded response factor method used a non-parametric bootstrap procedure to estimate select quantiles of training set RF distributions. Quantile estimates then were applied to test set HRMS intensities to inversely estimate concentrations with confidence limits. The ionization efficiency estimation method restricted the distribution of likely RFs for each analyte using ionization efficiency predictions. Given the intended future use for chemical risk characterization, predicted upper confidence limits (protective values) were compared to known chemical concentrations. Using traditional calibration curves, 95% of upper confidence limits were within ~tenfold of the true concentrations. The error increased to ~60-fold (ESI+) and ~120-fold (ESI-) for the ionization efficiency estimation method and to ~150-fold (ESI+) and ~130-fold (ESI-) for the bounded response factor method. This work demonstrates successful implementation of confidence limit estimation strategies to support qNTA studies and marks a crucial step towards translating NTA data in a risk-based context.


Asunto(s)
Incertidumbre , Calibración , Espectrometría de Masas/métodos
4.
J Chem Inf Model ; 61(2): 565-570, 2021 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-33481596

RESUMEN

The core goal of cheminformatics is to efficiently store robust and accurate chemical information and make it accessible for drug discovery, environmental analysis, and the development of prediction models including quantitative structure-activity relationships (QSAR). The U.S. Environmental Protection Agency (EPA) has developed a web-based application, the CompTox Chemicals Dashboard, which provides access to a compilation of data generated within the agency and sourced from public databases and literature and to utilities for real-time QSAR prediction and chemical read-across. While the vast majority of online tools only allow interrogation of chemicals one at a time, the Dashboard provides a batch search feature that allows for the sourcing of data based on thousands of chemical inputs at one time, by chemical identifier (e.g., names, Chemical Abstract Service registry numbers, or InChIKeys), or by mass or molecular formulas. Chemical information that can then be sourced via the batch search includes chemical identifiers and structures; intrinsic, physicochemical and fate and transport properties; in vitro and in vivo toxicity data; and the presence in environmentally relevant lists. We outline how to use the batch search feature and provide an overview regarding the type of information that can be sourced by considering a series of typical-use questions.


Asunto(s)
Relación Estructura-Actividad Cuantitativa , Bases de Datos Factuales , Estados Unidos , United States Environmental Protection Agency
5.
Environ Sci Technol ; 55(16): 11375-11387, 2021 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-34347456

RESUMEN

Recycled materials are found in many consumer products as part of a circular economy; however, the chemical content of recycled products is generally uncharacterized. A suspect screening analysis using two-dimensional gas chromatography time-of-flight mass spectrometry (GC × GC-TOFMS) was applied to 210 products (154 recycled, 56 virgin) across seven categories. Chemicals in products were tentatively identified using a standard spectral library or confirmed using chemical standards. A total of 918 probable chemical structures identified (112 of which were confirmed) in recycled materials versus 587 (110 confirmed) in virgin materials. Identified chemicals were characterized in terms of their functional use and structural class. Recycled paper products and construction materials contained greater numbers of chemicals than virgin products; 733 identified chemicals had greater occurrence in recycled compared to virgin materials. Products made from recycled materials contained greater numbers of fragrances, flame retardants, solvents, biocides, and dyes. The results were clustered to identify groups of chemicals potentially associated with unique chemical sources, and identified chemicals were prioritized for further study using high-throughput hazard and exposure information. While occurrence is not necessarily indicative of risk, these results can be used to inform the expansion of existing models or identify exposure pathways currently neglected in exposure assessments.


Asunto(s)
Retardadores de Llama , Materiales de Construcción , Retardadores de Llama/análisis , Cromatografía de Gases y Espectrometría de Masas , Reciclaje
6.
Anal Bioanal Chem ; 413(30): 7495-7508, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34648052

RESUMEN

With the increasing availability of high-resolution mass spectrometers, suspect screening and non-targeted analysis are becoming popular compound identification tools for environmental researchers. Samples of interest often contain a large (unknown) number of chemicals spanning the detectable mass range of the instrument. In an effort to separate these chemicals prior to injection into the mass spectrometer, a chromatography method is often utilized. There are numerous types of gas and liquid chromatographs that can be coupled to commercially available mass spectrometers. Depending on the type of instrument used for analysis, the researcher is likely to observe a different subset of compounds based on the amenability of those chemicals to the selected experimental techniques and equipment. It would be advantageous if this subset of chemicals could be predicted prior to conducting the experiment, in order to minimize potential false-positive and false-negative identifications. In this work, we utilize experimental datasets to predict the amenability of chemical compounds to detection with liquid chromatography-electrospray ionization-mass spectrometry (LC-ESI-MS). The assembled dataset totals 5517 unique chemicals either explicitly detected or not detected with LC-ESI-MS. The resulting detected/not-detected matrix has been modeled using specific molecular descriptors to predict which chemicals are amenable to LC-ESI-MS, and to which form(s) of ionization. Random forest models, including a measure of the applicability domain of the model for both positive and negative modes of the electrospray ionization source, were successfully developed. The outcome of this work will help to inform future suspect screening and non-targeted analyses of chemicals by better defining the potential LC-ESI-MS detectable chemical landscape of interest.

7.
J Cheminform ; 16(1): 19, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38378618

RESUMEN

The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.

8.
Front Environ Sci ; 10: 1-13, 2022 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-35936994

RESUMEN

Per- and polyfluoroalkyl substances (PFAS) are a class of man-made chemicals of global concern for many health and regulatory agencies due to their widespread use and persistence in the environment (in soil, air, and water), bioaccumulation, and toxicity. This concern has catalyzed a need to aggregate data to support research efforts that can, in turn, inform regulatory and statutory actions. An ongoing challenge regarding PFAS has been the shifting definition of what qualifies a substance to be a member of the PFAS class. There is no single definition for a PFAS, but various attempts have been made to utilize substructural definitions that either encompass broad working scopes or satisfy narrower regulatory guidelines. Depending on the size and specificity of PFAS substructural filters applied to the U.S. Environmental Protection Agency (EPA) DSSTox database, currently exceeding 900,000 unique substances, PFAS substructure-defined space can span hundreds to tens of thousands of compounds. This manuscript reports on the curation of PFAS chemicals and assembly of lists that have been made publicly available to the community via the EPA's CompTox Chemicals Dashboard. Creation of these PFAS lists required the harvesting of data from EPA and online databases, peer-reviewed publications, and regulatory documents. These data have been extracted and manually curated, annotated with structures, and made available to the community in the form of lists defined by structure filters, as well as lists comprising non-structurable PFAS, such as polymers and complex mixtures. These lists, along with their associated linkages to predicted and measured data, are fueling PFAS research efforts within the EPA and are serving as a valuable resource to the international scientific community.

9.
J Breath Res ; 15(2): 025001, 2021 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-33734097

RESUMEN

The U.S. EPA CompTox Chemicals Dashboard is a freely available web-based application providing access to chemistry, toxicity, and exposure data for ∼900 000 chemicals. Data, search functionality, and prediction models within the Dashboard can help identify chemicals found in environmental analyses and human biomonitoring. It was designed to deliver data generated to support computational toxicology to reduce chemical testing on animals and provide access to new approach methodologies including prediction models. The inclusion of mass and formula-based searches, together with relevant ranking approaches, allows for the identification and prioritization of exogenous (environmental) chemicals from high resolution mass spectrometry in need of further evaluation. The Dashboard includes chemicals that can be detected by liquid chromatography, gas chromatography-mass spectrometry (GC-MS) and direct-MS analyses, and chemical lists have been added that highlight breath-borne volatile and semi-volatile organic compounds. The Dashboard can be searched using various chemical identifiers (e.g. chemical synonyms, CASRN and InChIKeys), chemical formula, MS-ready formulae monoisotopic mass, consumer product categories and assays/genes associated with high-throughput screening data. An integrated search at a chemical level performs searches against PubMed to identify relevant published literature. This article describes specific procedures using the Dashboard as a first-stop tool for exploring both targeted and non-targeted results from GC-MS analyses of chemicals found in breath, exhaled breath condensate, and associated aerosols.


Asunto(s)
Pruebas Respiratorias , Animales , Cromatografía Liquida , Cromatografía de Gases y Espectrometría de Masas , Humanos , Estados Unidos , United States Environmental Protection Agency , Compuestos Orgánicos Volátiles
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA