RESUMO
Regulatory authorities aim to organize substances into groups to facilitate prioritization within hazard and risk assessment processes. Often, such chemical groupings are not explicitly defined by structural rules or physicochemical property information. This is largely due to how these groupings are developed, namely, a manual expert curation process, which in turn makes updating and refining groupings, as new substances are evaluated, a practical challenge. Herein, machine learning methods were leveraged to build models that could preliminarily assign substances to predefined groups. A set of 86 groupings containing 2,184 substances as published on the European Chemicals Agency (ECHA) website were mapped to the U.S. Environmental Protection Agency (EPA) Distributed Toxicity Structure Database (DSSTox) content to extract chemical and structural information. Substances were represented using Morgan fingerprints, and two machine learning approaches were used to classify test substances into 56 groups containing at least 10 substances with a structural representation in the data set: k-nearest neighbor (kNN) and random forest (RF), that led to mean 5-fold cross-validation test accuracies (average F1 scores) of 0.781 and 0.853, respectively. With a 9% improvement, the RF classifier was significantly more accurate than KNN (p-value = 0.001). The approach offers promise as a means of the initial profiling of new substances into predefined groups to facilitate prioritization efforts and streamline the assessment of new substances when earlier groupings are available. The algorithm to fit and use these models has been made available in the accompanying repository, thereby enabling both use of the produced models and refitting of these models, as new groupings become available by regulatory authorities or industry.
Assuntos
Algoritmos , Aprendizado de Máquina , Estados Unidos , United States Environmental Protection Agency , Bases de Dados FactuaisRESUMO
A new IUCLID database is provided containing results from non-clinical animal studies and human information for 530 approved drugs. The database was developed by extracting data from pharmacological reviews of repeat-dose, carcinogenicity, developmental, and reproductive toxicity studies. In the database, observed and no-observed effects are linked to the respective effect levels, including information on severity/incidence and transiency/reversibility. It also includes some information on effects in humans, that were extracted from relevant sections of standard product labels of the approved drugs. The database is complemented with a specific ontology for reporting effects that was developed as an improved version of the Ontology Lookup Service's mammalian and human phenotype ontologies and includes different hierarchical levels. The developed ontology contains novel and unique standardized terms, including ontological terms for reproductive and endocrine effects. The database aims to facilitate correlation and concordance analyses based on the link between observed and no-observed effects and their respective effect levels. In addition, it offers a robust dataset on drug information for the pharmaceutical industry and research. The reported ontology supports the analyses of toxicological information, especially for reproductive and endocrine endpoints and can be used to encode legacy data or develop additional ontologies. The new database and ontology can be used to support the development of alternative non-animal approaches, to elucidate mechanisms of toxicity, and to analyse human relevance. The new IUCLID database is provided free of charge at https://iuclid6.echa.europa.eu/us-fda-toxicity-data.
Assuntos
Indústria Farmacêutica , Sistema Endócrino , Animais , Humanos , Bases de Dados Factuais , Preparações Farmacêuticas , MamíferosRESUMO
The present study primarily aims at informing regulators and policy makers in Europe and examines the evolution of self-classifications and study availability for the endpoints of carcinogenicity, mutagenicity, reproductive toxicity (CMR) and specific target organ toxicity after repeated exposure (STOT RE) for the first ten years of the REACH legislation. Our knowledge on chemical safety keeps increasing due to the registration obligations under REACH, in combination with proactive actions by registrants and regulatory actions by Authorities, which jointly lead to new testing and critical reassessment of existing studies. The improvements become evident by the constant increase in the number of substances that are self-classified by the registrants for human health endpoints. Moreover, there is a slow but steady increase in the number of substances for which there is at least one experimental study available for the human health endpoints in scope of this analysis. However, the increase is slow given the generally limited data availability at the beginning of REACH. Manual examination of about 350 classified substances reveals that the impact of newly generated data and regulatory action by Authorities is greater for reproductive toxicity than for carcinogenicity or mutagenicity, reflecting the strengthening of the information requirements for reproductive toxicity with the introduction of REACH. The results of the study should inform regulators and policy makers at EU and national level in the discussion on potential changes to information requirements or testing strategies under REACH.
Assuntos
Alternativas aos Testes com Animais/legislação & jurisprudência , Testes de Carcinogenicidade , Testes de Mutagenicidade , Compostos Orgânicos/efeitos adversos , Animais , União Europeia , Humanos , Compostos Orgânicos/administração & dosagemRESUMO
We investigate the ability of current ab initio crystal structure prediction techniques to identify the polymorphs of 5-methyl-2-[(2-nitrophenyl)amino]-3-thiophenecarbonitrile, also known as ROY because of the red, orange and yellow colours of its polymorphs. We use a methodology combining the generation of a large number of structures based on a computationally inexpensive model using the CrystalPredictor global search algorithm, and the further minimization of the most promising of these structures using the CrystalOptimizer local minimization algorithm which employs an accurate, yet efficiently constructed, model based on isolated-molecule quantum-mechanical calculations. We demonstrate that this approach successfully predicts the seven experimentally resolved structures of ROY as lattice-energy minima, with five of these structures being within the 12 lowest energy structures predicted. Some of the other low-energy structures identified are likely candidates for the still unresolved polymorphs of this molecule. The relative stability of the predicted structures only partially matches that of the experimentally resolved polymorphs. The worst case is that of polymorph ON, whose relative energy with respect to Y is overestimated by 6.65 kJ mol(-1). This highlights the need for further developments in the accuracy of the energy calculations.
RESUMO
Following on from the success of the previous crystal structure prediction blind tests (CSP1999, CSP2001, CSP2004 and CSP2007), a fifth such collaborative project (CSP2010) was organized at the Cambridge Crystallographic Data Centre. A range of methodologies was used by the participating groups in order to evaluate the ability of the current computational methods to predict the crystal structures of the six organic molecules chosen as targets for this blind test. The first four targets, two rigid molecules, one semi-flexible molecule and a 1:1 salt, matched the criteria for the targets from CSP2007, while the last two targets belonged to two new challenging categories - a larger, much more flexible molecule and a hydrate with more than one polymorph. Each group submitted three predictions for each target it attempted. There was at least one successful prediction for each target, and two groups were able to successfully predict the structure of the large flexible molecule as their first place submission. The results show that while not as many groups successfully predicted the structures of the three smallest molecules as in CSP2007, there is now evidence that methodologies such as dispersion-corrected density functional theory (DFT-D) are able to reliably do so. The results also highlight the many challenges posed by more complex systems and show that there are still issues to be overcome.
Assuntos
Cristalografia por Raios X/métodos , Compostos Orgânicos/química , Bases de Dados Factuais , Modelos MolecularesRESUMO
Crystal structure prediction for organic molecules requires both the fast assessment of thousands to millions of crystal structures and the greatest possible accuracy in their relative energies. We describe a crystal lattice simulation program, DMACRYS, emphasizing the features that make it suitable for use in crystal structure prediction for pharmaceutical molecules using accurate anisotropic atom-atom model intermolecular potentials based on the theory of intermolecular forces. DMACRYS can optimize the lattice energy of a crystal, calculate the second derivative properties, and reduce the symmetry of the spacegroup to move away from a transition state. The calculated terahertz frequency k = 0 rigid-body lattice modes and elastic tensor can be used to estimate free energies. The program uses a distributed multipole electrostatic model (Q, t = 00,...,44s) for the electrostatic fields, and can use anisotropic atom-atom repulsion models, damped isotropic dispersion up to R(-10), as well as a range of empirically fitted isotropic exp-6 atom-atom models with different definitions of atomic types. A new feature is that an accurate model for the induction energy contribution to the lattice energy has been implemented that uses atomic anisotropic dipole polarizability models (alpha, t = (10,10)...(11c,11s)) to evaluate the changes in the molecular charge density induced by the electrostatic field within the crystal. It is demonstrated, using the four polymorphs of the pharmaceutical carbamazepine C(15)H(12)N(2)O, that whilst reproducing crystal structures is relatively easy, calculating the polymorphic energy differences to the accuracy of a few kJ mol(-1) required for applications is very demanding of assumptions made in the modelling. Thus DMACRYS enables the comparison of both known and hypothetical crystal structures as an aid to the development of pharmaceuticals and other speciality organic materials, and provides a tool to develop the modelling of the intermolecular forces involved in molecular recognition processes.
RESUMO
Following the computation of a lattice energy landscape which predicted that there should be more stable, denser forms of (R)-1-phenylethylammonium-(S)-2-phenylbutyrate, crystallizations from a range of solvents were performed to search for other polymorphs and investigate the possibility that the known P4(1) structure could be a hydrate. Extensive crystallization experiments from a wide range of solvents gave fine needles or microcrystalline samples. A redetermination of the P4(1) structure by powder X-ray diffraction located all protons, and in conjunction with other experimental and computational evidence showed that the structure was anhydrous. Evidence for two additional forms was found as mixtures with form I. These include an orthorhombic form, possibly a Z' = 3 polymorph, and another as yet unidentified form obtained as a minor component from dichloromethane solution. However, both these forms appear to be metastable with respect to form I (P4(1)), which is therefore probably the most thermodynamically stable form that can be crystallized from solution under ambient conditions. This determination of the solid state behavior of the less readily crystallized member of the diastereomeric salt system (R)-1-phenylethylammonium-(R/S)-2-phenylbutyrate provides a challenge to the theoretical modeling to explain its ideal resolution behavior.
Assuntos
Fenilbutiratos/química , Cristalização , Cristalografia por Raios X/métodos , Ligação de Hidrogênio , Cloreto de Metileno/química , Modelos Químicos , Modelos Moleculares , Modelos Teóricos , Conformação Molecular , Software , Solventes/química , Estereoisomerismo , Água/química , Difração de Raios XRESUMO
We report on the organization and outcome of the fourth blind test of crystal structure prediction, an international collaborative project organized to evaluate the present state in computational methods of predicting the crystal structures of small organic molecules. There were 14 research groups which took part, using a variety of methods to generate and rank the most likely crystal structures for four target systems: three single-component crystal structures and a 1:1 cocrystal. Participants were challenged to predict the crystal structures of the four systems, given only their molecular diagrams, while the recently determined but as-yet unpublished crystal structures were withheld by an independent referee. Three predictions were allowed for each system. The results demonstrate a dramatic improvement in rates of success over previous blind tests; in total, there were 13 successful predictions and, for each of the four targets, at least two groups correctly predicted the observed crystal structure. The successes include one participating group who correctly predicted all four crystal structures as their first ranked choice, albeit at a considerable computational expense. The results reflect important improvements in modelling methods and suggest that, at least for the small and fairly rigid types of molecules included in this blind test, such calculations can be constructively applied to help understand crystallization and polymorphism of organic molecules.
Assuntos
Acroleína/química , Benzotiazóis/química , Simulação por Computador , Fluorbenzenos/química , Tionas/química , Cristalização , Cristalografia por Raios X , Modelos Moleculares , Estrutura Molecular , Teoria QuânticaRESUMO
This paper reports a novel methodology for the free-energy minimization of crystal structures exhibiting strong, anisotropic interactions due to hydrogen bonding. The geometry of the thermally expanded cell was calculated by exploiting the dependence of the free-energy derivatives with respect to cell lengths and angles on the average pressure tensor computed in short molecular dynamics simulations. All dynamic simulations were performed with an elaborate anisotropic potential based on a distributed multipole analysis of the isolated molecule charge density. Changes in structure were monitored via simulated X-ray diffraction patterns. The methodology was used to minimize the free energy at ambient conditions of a set of experimental and hypothetical 5-fluorouracil crystal structures, generated in a search for lattice-energy minima with the same model potential. Our results demonstrate that the majority ( approximately 75%) of lattice-energy minima are thermally stable at ambient conditions, and hence, the free-energy (like the lattice-energy) surface is complex and highly undulating. Metadynamics trajectories (Laio, A.; Parrinello, M. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 12562) started from the free-energy minima only produced transitions that preserved the hydrogen-bonding motif, and thus, further developments are needed for this method to efficiently explore such free-energy surfaces. The existence of so many free-energy minima, with large barriers for the alteration of the hydrogen-bonding motif, is consistent with the range of motifs observed in crystal structures of 5-fluorouracil and other 5-substituted uracils.
RESUMO
The predicted stability differences of the conformational polymorphs of oxalyl dihydrazide and ortho-acetamidobenzamide are unrealistically large when the modeling of intermolecular energies is solely based on the isolated-molecule charge density, neglecting charge density polarization. Ab initio calculated crystal electron densities showed qualitative differences depending on the spatial arrangement of molecules in the lattice with the greatest variations observed for polymorphs that differ in the extent of inter- and intramolecular hydrogen bonding. We show that accounting for induction dramatically alters the calculated stability order of the polymorphs and reduces their predicted stability differences to be in better agreement with experiment. Given the challenges in modeling conformational polymorphs with marked differences in hydrogen bonding geometries, we performed an extensive periodic density functional study with a range of exchange-correlation functionals using both atomic and plane wave basis sets. Although such electronic structure methods model the electrostatic and polarization contributions well, the underestimation of dispersion interactions by current exchange-correlation functionals limits their applicability. The use of an empirical dispersion-corrected density functional method consistently reduces the structural deviations between the experimental and energy minimized crystal structures and achieves plausible stability differences. Thus, we have established which types of models may give worthwhile relative energies for crystal structures and other condensed phases of flexible molecules with intra- and intermolecular hydrogen bonding capabilities, advancing the possibility of simulation studies on polymorphic pharmaceuticals.
Assuntos
Ligação de Hidrogênio , Modelos Moleculares , Compostos Orgânicos/química , Simulação por Computador , Cristalização , Conformação Molecular , Preparações Farmacêuticas/químicaRESUMO
The crystal structures, including two new polymorphs, of three diastereomerically related salt pairs formed by (R)-1-phenylethylammonium (1) with (S&R)-2-phenylpropanoate (2), (S&R)-2-phenylbutyrate (3), and (S&R)-mandelate (4) ions were characterized by low-temperature single crystal or powder X-ray diffraction. Thermal, solubility, and solution calorimetry measurements were used to determine the relative stabilities of the salt pairs and polymorphs. These were qualitatively predicted by lattice energy calculations combining realistic models for the dominant intermolecular electrostatic interactions and ab initio calculations for the ions' conformational energies due to the distortion of their geometries by the crystal packing forces. Crystal structure prediction studies were also performed for the highly polymorphic diastereomeric salt pair (R)-1-phenylethylammonium-(S&R)-2-phenylbutyrate (1-3) in an attempt to predict the separation efficiency without relying on experimental information. This joint experimental and computational investigation provides a stringent test for the reliability of lattice modeling approaches to explain the origins of chiral resolution via diastereomer formation (Pasteurian resolution). The further developments required for the computational screening of single-enantiomer resolving agents to achieve optimal chiral separation are discussed.
RESUMO
Progesterone has been known to be polymorphic for over 70 years, and crystallization conditions for the production of both experimentally characterized polymorphs have been repeatedly reported in the literature up to 1975. Nevertheless, our attempts to produce crystals of the metastable form 2 suitable for single crystal X-ray diffraction failed until the structurally related molecule pregnenolone was introduced as an additive into the crystallization solution. Accurate low temperature crystal structures were obtained for forms 1 and 2, pregnenolone and a newly discovered pregnenolone-progesterone co-crystal, which appeared concomitantly with progesterone forms 1 and 2. Computational work based on the experimental crystal structures and those generated by a search for low energy structures showed that the crystallization of enantiomerically pure progesterone results in a more strained conformation compared with the racemate due to the rotation of the acetyl and 21-methyl groups. The role of impurities or additives in influencing crystallization outcome is discussed.
Assuntos
Cristalografia por Raios X , Excipientes/química , Computação Matemática , Modelos Moleculares , Pregnenolona/química , Progesterona/química , Tecnologia Farmacêutica/métodos , Química Farmacêutica , Cristalização , Composição de Medicamentos , Estabilidade de Medicamentos , Conformação Molecular , Estrutura Molecular , Software , TemperaturaRESUMO
A computational prediction that mixing the synthetic mirror image of progesterone with its natural form would produce a specific racemic crystal structure was validated.
Assuntos
Simulação por Computador , Progesterona/química , Cristalização , Cristalografia por Raios X , Modelos Moleculares , Conformação Molecular , EstereoisomerismoRESUMO
A methodology for the computational prediction of the crystal structures and resolution efficiency for diastereomeric salt pairs is developed by considering the polymorphic system of the diastereomeric salt pair (R)-1-phenylethylammonium (R/S)-2-phenylpropanoate. To alleviate the mathematical complexity of the search for minima in the lattice energy due to the presence of two flexible entities in the asymmetric unit, the range of rigid-body lattice energy global optimizations was guided by a statistical analysis of the Cambridge Structural Database for common ion-pair geometries and ion conformations. A distributed multipole model for the dominant electrostatic interactions and high-level ab initio calculations for the intramolecular energy penalty for conformational distortions are used to quantify the relative stabilities of the p- and n-salt forms. While the ab initio prediction of the known structure of the p-salt as the most stable structure was insensitive to minor changes in the rigid-ion conformations considered, the relative stabilities of the known polymorphs and hypothetical structures of the n-salt were very sensitive. Although this paper provides a significant advance over traditional search algorithms and empirical force fields in determining the structures and relative stabilities of diastereomeric salt pairs, the sensitivity of the computed lattice energies to the fine details of the ion conformations overtaxes current computational models and renders the design of diastereomeric resolution processes by computational chemistry a challenging problem.
Assuntos
Sais/química , Algoritmos , Cristalização , Modelos Moleculares , Estrutura Molecular , EstereoisomerismoRESUMO
Substances of unknown or variable composition, complex reaction products, or biological materials (UVCBs) have been conventionally described in generic terms. Commonly used substance identifiers are generic names of chemical classes, generic structural formulas, reaction steps, physical-chemical properties, or spectral data. Lack of well-defined structural information has significantly restricted in silico fate and hazard assessment of UVCB substances. A methodology for the structural description of UVCB substances has been developed that allows use of known identifiers for coding, generation, and selection of representative constituents. The developed formats, Generic Simplified Molecular-Input Line-Entry System (G SMILES) and Generic Graph (G Graph), address the need to code, generate, and select representative UVCB constituents; G SMILES is a SMILES-based single line notation coding fixed and variable structural features of UVCBs, whereas G Graph is based on a workflow paradigm that allows generation of constituents coded in G SMILES and end point-specific or nonspecific selection of representative constituents. Structural description of UVCB substances as afforded by the developed methodology is essential for in silico fate and hazard assessment. Data gap filling approaches such as read-across, trend analysis, or quantitative structure-activity relationship modeling can be applied to the generated constituents, and the results can be used to assess the substance as a whole. The methodology also advances the application of category-based data gap filling approaches to UVCB substances.
Assuntos
Ácidos Graxos/química , Óleos/química , Fenóis/química , Extratos Vegetais/química , Hidrocarbonetos Policíclicos Aromáticos/química , Recuperação e Remediação Ambiental , Ácidos Graxos/metabolismo , Óleos/metabolismo , Fenóis/metabolismo , Extratos Vegetais/metabolismo , Hidrocarbonetos Policíclicos Aromáticos/metabolismo , Relação Quantitativa Estrutura-Atividade , Medição de RiscoRESUMO
Solvents can significantly alter the rates and selectivity of liquid-phase organic reactions, often hindering the development of new synthetic routes or, if chosen wisely, facilitating routes by improving rates and selectivities. To address this challenge, a systematic methodology is proposed that quickly identifies improved reaction solvents by combining quantum mechanical computations of the reaction rate constant in a few solvents with a computer-aided molecular design (CAMD) procedure. The approach allows the identification of a high-performance solvent within a very large set of possible molecules. The validity of our CAMD approach is demonstrated through application to a classical nucleophilic substitution reaction for the study of solvent effects, the Menschutkin reaction. The results were validated successfully by in situ kinetic experiments. A space of 1,341 solvents was explored in silico, but required quantum-mechanical calculations of the rate constant in only nine solvents, and uncovered a solvent that increases the rate constant by 40%.
Assuntos
Desenho Assistido por Computador , Solventes/química , Simulação por Computador , Cinética , Modelos Químicos , Relação Estrutura-AtividadeRESUMO
A study of two dihydroxybenzoic acid isomers shows that computational methods can be used to predict hydrate formation, the compound:water ratio and hydrate crystal structures. The calculations also help identify a novel hydrate found in the solid form screening that validates this study.
RESUMO
The range of target structures in the fifth international blind test of crystal structure prediction was extended to include a highly flexible molecule, (benzyl-(4-(4-methyl-5-(p-tolylsulfonyl)-1,3-thiazol-2-yl)phenyl)carbamate, as a challenge representative of modern pharmaceuticals. Two of the groups participating in the blind test independently predicted the correct structure. The methods they used are described and contrasted, and the implications of the capability to tackle molecules of this complexity are discussed.