Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Crit Rev Toxicol ; 54(9): 659-684, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39225123

RESUMO

This article aims to provide a comprehensive critical, yet readable, review of general interest to the chemistry community on molecular similarity as applied to chemical informatics and predictive modeling with a special focus on read-across (RA) and read-across structure-activity relationships (RASAR). Molecular similarity-based computational tools, such as quantitative structure-activity relationships (QSARs) and RA, are routinely used to fill the data gaps for a wide range of properties including toxicity endpoints for regulatory purposes. This review will explore the background of RA starting from how structural information has been used through to how other similarity contexts such as physicochemical, absorption, distribution, metabolism, and elimination (ADME) properties, and biological aspects are being characterized. More recent developments of RA's integration with QSAR have resulted in the emergence of novel models such as ToxRead, generalized read-across (GenRA), and quantitative RASAR (q-RASAR). Conventional QSAR techniques have been excluded from this review except where necessary for context.

2.
Anal Bioanal Chem ; 416(10): 2565-2579, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38530399

RESUMO

Mass-spectrometry-based non-targeted analysis (NTA), in which mass spectrometric signals are assigned chemical identities based on a systematic collation of evidence, is a growing area of interest for toxicological risk assessment. Successful NTA results in better identification of potentially hazardous pollutants within the environment, facilitating the development of targeted analytical strategies to best characterize risks to human and ecological health. A supporting component of the NTA process involves assessing whether suspected chemicals are amenable to the mass spectrometric method, which is necessary in order to assign an observed signal to the chemical structure. Prior work from this group involved the development of a random forest model for predicting the amenability of 5517 unique chemical structures to liquid chromatography-mass spectrometry (LC-MS). This work improves the interpretability of the group's prior model of the same endpoint, as well as integrating 1348 more data points across negative and positive ionization modes. We enhance interpretability by feature engineering, a machine learning practice that reduces the input dimensionality while attempting to preserve performance statistics. We emphasize the importance of interpretable machine learning models within the context of building confidence in NTA identification. The novel data were curated by the labeling of compounds as amenable or unamenable by expert curators, resulting in an enhanced set of chemical compounds to expand the applicability domain of the prior model. The balanced accuracy benchmark of the newly developed model is comparable to performance previously reported (mean CV BA is 0.84 vs. 0.82 in positive mode, and 0.85 vs. 0.82 in negative mode), while on a novel external set, derived from this work's data, the Matthews correlation coefficients (MCC) for the novel models are 0.66 and 0.68 for positive and negative mode, respectively. Our group's prior published models scored MCC of 0.55 and 0.54 on the same external sets. This demonstrates appreciable improvement over the chemical space captured by the expanded dataset. This work forms part of our ongoing efforts to develop models with higher interpretability and higher performance to support NTA efforts.

3.
J Cheminform ; 16(1): 19, 2024 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-38378618

RESUMO

The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.

4.
J Cheminform ; 15(1): 92, 2023 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-37798806
5.
RNA ; 29(11): 1644-1657, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37580126

RESUMO

The identification of catalytic RNAs is typically achieved through primarily experimental means. However, only a small fraction of sequence space can be analyzed even with high-throughput techniques. Methods to extrapolate from a limited data set to predict additional ribozyme sequences, particularly in a human-interpretable fashion, could be useful both for designing new functional RNAs and for generating greater understanding about a ribozyme fitness landscape. Using information theory, we express the effects of epistasis (i.e., deviations from additivity) on a ribozyme. This representation was incorporated into a simple model of the epistatic fitness landscape, which identified potentially exploitable combinations of mutations. We used this model to theoretically predict mutants of high activity for a self-aminoacylating ribozyme, identifying potentially active triple and quadruple mutants beyond the experimental data set of single and double mutants. The predictions were validated experimentally, with nine out of nine sequences being accurately predicted to have high activity. This set of sequences included mutants that form a previously unknown evolutionary "bridge" between two ribozyme families that share a common motif. Individual steps in the method could be examined, understood, and guided by a human, combining interpretability and performance in a simple model to predict ribozyme sequences by extrapolation.


Assuntos
RNA Catalítico , Humanos , RNA Catalítico/genética , RNA Catalítico/metabolismo , Epistasia Genética , Mutação , Evolução Biológica , Aptidão Genética
6.
Chem Res Toxicol ; 36(3): 465-478, 2023 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-36877669

RESUMO

The need for careful assembly, training, and validation of quantitative structure-activity/property models (QSAR/QSPR) is more significant than ever as data sets become larger and sophisticated machine learning tools become increasingly ubiquitous and accessible to the scientific community. Regulatory agencies such as the United States Environmental Protection Agency must carefully scrutinize each aspect of a resulting QSAR/QSPR model to determine its potential use in environmental exposure and hazard assessment. Herein, we revisit the goals of the Organisation for Economic Cooperation and Development (OECD) in our application and discuss the validation principles for structure-activity models. We apply these principles to a model for predicting water solubility of organic compounds derived using random forest regression, a common machine learning approach in the QSA/PR literature. Using public sources, we carefully assembled and curated a data set consisting of 10,200 unique chemical structures with associated water solubility measurements. This data set was then used as a focal narrative to methodically consider the OECD's QSA/PR principles and how they can be applied to random forests. Despite some expert, mechanistically informed supervision of descriptor selection to enhance model interpretability, we achieved a model of water solubility with comparable performance to previously published models (5-fold cross validated performance 0.81 R2 and 0.98 RMSE). We hope this work will catalyze a necessary conversation around the importance of cautiously modernizing and explicitly leveraging OECD principles while pursuing state-of-the-art machine learning approaches to derive QSA/PR models suitable for regulatory consideration.


Assuntos
Organização para a Cooperação e Desenvolvimento Econômico , Relação Quantitativa Estrutura-Atividade , Solubilidade , Algoritmos , Água/química
7.
J Phys Chem B ; 124(37): 8012-8022, 2020 09 17.
Artigo em Inglês | MEDLINE | ID: mdl-32790375

RESUMO

Variational autoencoders are artificial neural networks with the capability to reduce highly dimensional sets of data to smaller dimensional, latent representations. In this work, these models are applied to molecular dynamics simulations of the self-assembly of coarse-grained peptides to obtain a singled-valued order parameter for amyloid aggregation. This automatically learned order parameter is constructed by time-averaging the latent parametrizations of internal coordinate representations and compared to the nematic order parameter which is commonly used to study ordering of similar systems in literature. It is found that the latent space value provides more tailored insight into the aggregation mechanism's details, correctly identifying fibril formation in instances where the nematic order parameter fails to do so. A means is provided by which the latent space value can be analyzed so that the major contributing internal coordinates are identified, allowing for a direct interpretation of the latent space order parameter in terms of the behavior of the system. The latent model is found to be an effective and convenient way of representing the data from the dynamic ensemble and provides a means of reducing the dimensionality of a system whose scale exceeds molecular systems so-far considered with similar tools. This bypasses a need for researcher speculation on what elements of a system best contribute to summarizing major transitions and suggests latent models are effective and insightful when applied to large systems with a diversity of complex behaviors.


Assuntos
Simulação de Dinâmica Molecular , Peptídeos , Amiloide , Redes Neurais de Computação
8.
J Phys Chem B ; 123(25): 5256-5264, 2019 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-31150250

RESUMO

Despite the importance of amyloid formation in disease pathology, the understanding of the primary structure?activity relationship for amyloid-forming peptides remains elusive. Here we use a new neural-network based method of analysis: the classifying autoencoder (CAE). This machine learning technique uses specialized architecture of artificial neural networks to provide insight into typically opaque classification processes. The method proves to be robust to noisy and limited data sets, as well as being capable of disentangling relatively complicated rules over data sets. We demonstrate its capabilities by applying the technique to an experimental database (the Waltz database) and demonstrate the CAE?s capability to provide insight into a novel descriptor, dimeric isotropic deviation?an experimental measure of the aggregation properties of the amino acids. We measure this value for all 20 of the common amino acids and find correlation between dimeric isotropic deviation and the failure to form amyloids when hydrophobic effects are not a primary driving force in amyloid formation. These applications show the value of the new method and provide a flexible and general framework to approach problems in biochemistry using artificial neural networks.


Assuntos
Amiloide/química , Proteínas Amiloidogênicas/química , Aprendizado de Máquina , Peptídeos/química , Aminoácidos/química , Aminoácidos/metabolismo , Amiloide/metabolismo , Proteínas Amiloidogênicas/metabolismo , Bases de Dados Factuais , Dimerização , Interações Hidrofóbicas e Hidrofílicas , Peptídeos/metabolismo , Agregados Proteicos
9.
J Comput Chem ; 38(16): 1353-1361, 2017 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-28236312

RESUMO

The conformational states adopted by a polymer chain in water are a result of a delicate balance between intra-molecular and water-mediated interactions. Using an explicit representation of the solvent is, however, computationally expensive and it is often necessary to turn to implicit representations. We present a systematic derivation of implicit models of water and study the effect of simplifying the representation of the solvent on the conformations of hydrophobic homopolymers of varying length. Starting from the explicit coarse-grained single site mW water model, we develop an implicit solvent model that reproduces the free energy of the contact pair between two hydrophobic monomers, an implicit solvent model that captures the free energy of contact pair minima, desolvation barrier, and solvent-separated minima, and finally, we consider vacuum simulations. We generate potentials of mean force for polymers of various lengths in explicit water, the implicit solvents and vacuum, using umbrella sampling and replica exchange molecular dynamics simulations. Surprisingly, vacuum simulations outperform the implicit solvent simulations, with the implicit model involving a desolvation barrier producing spurious extended polymer conformations. © 2017 Wiley Periodicals, Inc.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA