RESUMO
Pattern recognition and allied multivariate methods provide an approach to the interpretation of the multivariate data often encountered in analytical chemistry. Widely used methods include mapping and display, discriminant development, clustering, and modeling. Each has been applied to a variety of chemical problems, and examples are given. The results of two recent studies are shown, a classification of subjects as normal or cystic fibrosis heterozygotes and simulation of chemical shifts of carbon-13 nuclear magnetic resonance spectra by linear model equations.
Assuntos
Técnicas de Química Analítica , Reconhecimento Automatizado de Padrão , Radioisótopos de Carbono , Fibrose Cística/diagnóstico , Triagem de Portadores Genéticos , Humanos , Espectroscopia de Ressonância Magnética , Relação Estrutura-AtividadeRESUMO
Computerized pattern recognition techniques can be applied to the study of complex chemical communication systems. Analysis of high resolution gas chromatographic concentration patterns of the major volatile components of the scent marks of a South American primate, Saguinus fuscicollis, demonstrates that the concentration patterns can be used to predict the gender and subspecies of unknown donors.
Assuntos
Computadores , Reconhecimento Automatizado de Padrão , Feromônios/fisiologia , Atrativos Sexuais/fisiologia , Animais , Fenômenos Químicos , Química , Cromatografia Gasosa , Feminino , Masculino , Saguinus/fisiologia , Glândulas Odoríferas/fisiologia , Relação Estrutura-AtividadeRESUMO
Linear discriminant analysis is used to generate models to classify multidrug-resistance reversal agents based on activity. Models are generated and evaluated using multidrug-resistance reversal activity values for 609 compounds measured using adriamycin-resistant P388 murine leukemia cells. Structure-based descriptors numerically encode molecular features which are used in model formation. Two types of models are generated: one type to classify compounds as inactive, moderately active, and active (three-class problem) and one type to classify compounds as inactive or active without considering the moderately active class (two-class problem). Two activity distributions are considered, where the separation between inactive and active compounds is different. When the separation between inactive and active classes is small, a model based on nine topological descriptors is developed that produces a classification rate of 83.1% correct for an external prediction set. Larger separation between active and inactive classes raises the prediction set classification rate to 92.0% correct using a model with six topological descriptors. Models are further validated through Monte Carlo experiments in which models are generated after class labels have been scrambled. The classification rates achieved demonstrate that the models developed could serve as a screening mechanism to identify potentially useful MDRR agents from large libraries of compounds.
Assuntos
Antineoplásicos/classificação , Resistência a Múltiplos Medicamentos , Resistencia a Medicamentos Antineoplásicos , Animais , Antibióticos Antineoplásicos/farmacologia , Antineoplásicos/química , Doxorrubicina/farmacologia , Leucemia P388 , Modelos Lineares , Camundongos , Modelos Biológicos , Modelos Moleculares , Método de Monte Carlo , Células Tumorais CultivadasRESUMO
Studies of molecular structure-carcinogenicity relations for a set of 157 aromatic amines are reported. A computer-assisted approach using pattern-recognition methods was used to develop a series of discriminants for aromatic amino carcinogenic potential. The 157 compounds were divided into subsets according to tumor site, route of administration, and activity. Sets of calculated molecular structure descriptors were generated that could support linear discriminant functions able to separate sets of active carcinogens from inactive compounds. Prominent among the important structural descriptors were those coding sizes and shapes of the amines. The pattern-recognition results were not strongly affected by differences in active site, and the study showed that mixed data sets could be used in computer-assisted structure-carcinogenicity studies.
Assuntos
Aminas/toxicidade , Carcinógenos/toxicidade , Animais , Sítios de Ligação , Computadores , Reconhecimento Automatizado de Padrão , Ratos , Relação Estrutura-AtividadeRESUMO
Pattern-recognition techniques have been applied to the study of relationships between the molecular structure of nitrosamines and their carcinogenic potential. A set of 150 nitrosamines (112 carcinogenic and 38 noncarcinogenic) was used. Each compound was represented by a set of calculated molecular structure descriptors. Discriminants were found that could separate 146 of the compounds into the two activity classes based on a set of 22 descriptors. Internal consistency checking showed that the 22 descriptors used supported a meaningful discriminant. The results show that sufficient information is contained within the structure of N-nitroso compounds to allow classification into carcinogenic activity classes.
Assuntos
Compostos Nitrosos/farmacologia , Reconhecimento Automatizado de Padrão , Carcinógenos , Computadores , Nitrosaminas/toxicidade , Relação Estrutura-AtividadeRESUMO
N-nitroso compounds, consisting of nitrosamines and nitrosamides, are potentially important in the etiology of human cancer. An attempt to study the molecular structure-carcinogenicity relations of these compounds is reported. A pattern-recognition approach was used to develop predictive ability for carcinogenic potential. A set of 15 calculated molecular structure descriptors that supported a linear discriminant function able to successfully separate 116 carcinogens from 28 noncarcinogens was identified. Predictive ability of an overall of 91%--93% for carcinogens and 85% for noncarcinogens--was obtained in the randomized testing. This relatively high predictability demonstrates that pattern-recognition methods can be useful in analyzing these compounds for carcinogenic activity. The inclusion of two electronic descriptors implicitly supports the alpha-hydroxylation hypothesis. The relations of descriptors used and possible mechanism of action are discussed.
Assuntos
Carcinógenos , Compostos Nitrosos/farmacologia , Amidas/farmacologia , Computadores , Nitrosaminas/farmacologia , Relação Estrutura-AtividadeRESUMO
The relationship between variation in structure and variation in antiinflammatory activity was investigated for 125 steroids whose antiinflammatory activity had previously been determined by using the McKenzie-Stoughton human vasoconstrictor assay. Eighty-eight of the compounds were used in the training stages of analysis. A two-class problem was developed by classifying the compounds as low-to-no potency (37 compounds) or potent-to-very potent (51 compounds) on the basis of their activity relative to that of hydrocortisone butyrate. Thirty-eight different structural variations occurred at six different sites on the steroid nucleus. These variations were coded by a total of 10 descriptors--three indicator descriptors and seven descriptors that coded for the lipophilicity of the substituents at specific sites of variation. Linear discriminant analysis, principal components plots, K nearest neighbor analysis, and statistical measurements of class separation all confirmed that the more potent compounds existed in a region of the data space different from the less potent compounds. This structure-activity relationship was applied to the prediction of the activities of 37 compounds that were not used in the preliminary analysis with good results.
Assuntos
Anti-Inflamatórios/farmacologia , Humanos , Análise Numérica Assistida por Computador , Esteroides , Relação Estrutura-Atividade , Vasoconstrição/efeitos dos fármacosRESUMO
A structure-activity relations study has been performed on a heterogeneous set of organic compounds to develop predictive ability for carcinogenic potential. The compounds employed came from more than 12 structural classes and numbered 130 carcinogens and 79 noncarcinogens. A set of 28 calculated molecular structure descriptors was identified that supported a linear discriminant function able to completely separate 192 compounds into the carcinogenic and noncarcinogenic classes. A predictive ability of 90% for carcinogens and 78% for noncarcinogens was obtained in randomized testing. The results demonstrate that pattern-recognition methods can be used to analyze a diverse set of compounds each represented by calculated molecular structure descriptors for a common biological activity.
Assuntos
Carcinógenos/toxicidade , Computadores , Métodos , Conformação Molecular , Teoria Quântica , Análise de Regressão , Relação Estrutura-AtividadeRESUMO
A pattern-recognition analysis using the ADAPT system was performed on a set of 9-anilinoacridine antitumor agents, to determine whether computer-generated descriptors could be used to separate active from inactive compounds. A training set of 213 compounds was chosen by random computer selection from a list of 776 structures. Maximal increase in life span at the LD10 dosage, a response which is difficult to model using traditional Hansch analysis, was used as the measure of biological activity. A set of 18 molecular descriptors, including fragment, substructure environment, and physicochemical property descriptors (molar refraction, partial electronic charge) was identified which could correctly classify 94% of the compounds in the training set (97% of active and 85% of inactive compounds). Eight of the inactive compounds that were misclassified contained amino substituents, suggesting a role for ionization. The weight vector that was obtained from the training set was applied to a prediction set of 50 compounds that were not included in the original analysis and to a set of 69 structures drawn from the recent literature. The prediction set results, ranging from 73 to 86% correct, were lower than those of the training set, but they clearly indicate that pattern-recognition techniques can be useful in the screening of proposed or already existing agents and especially useful for the identification of active compounds.
Assuntos
Aminoacridinas/farmacologia , Antineoplásicos/farmacologia , Antineoplásicos/classificação , Fenômenos Químicos , Físico-Química , Modelos Moleculares , Conformação Molecular , Reconhecimento Automatizado de Padrão , Relação Estrutura-AtividadeRESUMO
Often a compound's biological activity is determined by complex relationships between its structural components. Such a relationship often can only be adequately described and exploited by multivariate structure-activity relationship (SAR) studies that can deal with many variables simultaneously. Pattern recognition (PR) is a multivariate technique that is well suited for the qualitative, active-inactive, data that is often supplied by biological assays. PR studies of compounds of known activity can yield information that will allow the prediction of the activity of untested compounds. ADAPT is a computerized system that was developed for such PR-SAR studies. A general introduction to this field is presented and the methodology used for such a study is described in the context of an actual study of mutagenic compounds. The data requirements, descriptor generation, and the details of a PR study are discussed. In addition, the example study was chosen to highlight the problems that may occur if a study is not well formulated and carefully executed. Current work and future plans for computerized mutagen screening are discussed.
Assuntos
Computadores , Mutagênicos , Reconhecimento Automatizado de Padrão , Relação Estrutura-AtividadeRESUMO
The relationship between molecular structure and duration of depressant effect for barbiturates was investigated. A data set of 160 5,5'-disubstituted barbiturates with various acyclic substituents was coded using 47 numerical descriptors including fragments, substructures, environmental descriptors, and molecular connectivity indexes. All descriptors were derived directly from the connection tables of the barbiturates. Using an interactive error-correction feedback algorithm, linear discriminant functions were developed that could dichotomize the data set with respect to several thresholds separating longer from shorter acting compounds. Feature selection was used to focus on the relatively few structural descriptors sufficient to support linear separability. For three specific thresholds, nine, 11, and nine descriptors were sufficient. The importance of these descriptors and the utility of the technique are discussed. Predictive abilities of approximately 94% were obtained for known barbiturates of the same general molecular types.
Assuntos
Barbitúricos/farmacologia , Animais , Fenômenos Químicos , Físico-Química , Depressão Química , Camundongos , Modelos Biológicos , Coelhos , Ratos , Relação Estrutura-AtividadeRESUMO
Using the ADAPT and CHEMLAB-II systems for structure-activity analysis, computer-calculated electronic properties of molecules were used to derive structure-activity relationships for predicting the mutagenicity of a set of substituted acridines in strain TA1537 of the Ames Salmonella assay. A collection of 40 acridines, with a variety of substituents, was examined. A set of 4 electronic descriptors was found which could be used to correctly classify all but two of the compounds as mutagenic or nonmutagenic. A negative correlation was found between the sum of the Hammett aromatic substituent parameters and the level of mutagenicity of the structures, expressed as log(number of revertants/plate + 1) at a 20-micrograms dose. This correlation, however, was not high enough to allow precise estimation of the mutagenicity values.
Assuntos
Acridinas/farmacologia , Mutagênicos , Mutação , Testes de Mutagenicidade , Salmonella typhimurium/efeitos dos fármacos , Relação Estrutura-AtividadeRESUMO
Predictive models are developed for the 13C NMR chemical shifts of the carbon atoms comprising the central rings of 46 trisaccharide compounds. Thirty-nine trisaccharides are used as a training set for development of models using regression analysis and computational neural networks, and seven compounds are used as an external prediction set. The descriptors used in the models are developed directly from the molecular structures of the trisaccharides. Three different methods of descriptor selection are compared. The dependence of the models on the geometries of the trisaccharides is explored. The models developed with geometric descriptors are better than those developed without geometric descriptors, although the latter models are still of a comparable quality. Overall, the best model found is a neural network based on descriptors selected by multiple linear regression.
Assuntos
Trissacarídeos/química , Algoritmos , Configuração de Carboidratos , Sequência de Carboidratos , Modelos Lineares , Espectroscopia de Ressonância Magnética , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Molecular , Redes Neurais de Computação , Análise de RegressãoRESUMO
A quantitative structure-activity relationship (QSAR) investigation was done for the acute oral mammalian toxicity (LD50) of a set of 54 organophosphorus pesticide compounds. The compounds were represented with calculated molecular structure descriptors, which encoded their topological, electronic, and geometrical features. Feature selection was done with a genetic algorithm to find subsets of descriptors that would support a high quality computational neural network (CNN) model to link the structural descriptors to the -log(mmol/kg) values for the compounds. The best seven-descriptor non-linear CNN model found had an rms error of 0.22 log units for the training set compounds and 0.25 log units for the prediction set compounds.
Assuntos
Inseticidas/química , Inseticidas/toxicidade , Compostos Organofosforados , Animais , Feminino , Dose Letal Mediana , Masculino , Modelos Biológicos , Modelos Químicos , Estrutura Molecular , Método de Monte Carlo , Redes Neurais de Computação , Ratos , Reprodutibilidade dos Testes , Relação Estrutura-AtividadeRESUMO
The design and blood brain barrier crossing of glycine/NMDA receptor antagonists are of significant interest in pharmaceutical research. The use of these antagonists in stroke or seizure reduction have been considered. Measuring the inhibitory concentrations, however, can be time-consuming and costly. The use of quantitative structure-activity relationships to estimate IC(50) values for these receptor antagonists is an attractive alternative compared to experimental measurement. A data set of 109 compounds with measured log(IC(50)) values ranging from -0.57 to 4.5 is used. Structural information is encoded with numerical descriptors for topological, electronic, geometric, and polar surface properties. A genetic algorithm with a computational neural network fitness evaluator is used to select the best descriptor subsets. Multiple linear regression and computational neural network models are developed. Additionally, a quantitative radial basis function neural network (QRBFNN) was developed with the intent of introducing nonlinearity at a faster speed. A genetic algorithm using the radial basis function network as a fitness evaluator was also developed to search descriptor space for optimum subsets. All models are tested using an external prediction set. The nonlinear computational neural network model has root-mean-square errors of approximately half a log unit.