RESUMO
The new software QSARINS-Chem standalone version is a multiplatform tool, freely downloadable, for the in silico profiling of multiple properties and activities of organic chemicals. This software, which is based on the concept of the QSARINS-chem module embedded in the QSARINS software, has been fully redesigned and redeveloped in the Java™ language. In addition to a selection of models included in the old module, the new software predicts biotransformation rates and aquatic toxicities of pharmaceuticals and personal care products in multiple organisms, and offers a suite of tools for the analysis of predictions. Furthermore, a comprehensive and transparent database of molecular structures is provided. The new QSARINS-Chem standalone version is an informative and solid tool, which is useful to support the assessment of the potential hazard and risks related to organic chemicals and is dedicated to users which are interested in the application of QSARs to generate reliable predictions.
Assuntos
Compostos Orgânicos/química , Relação Quantitativa Estrutura-Atividade , Software , Animais , Peixes , Estrutura Molecular , Compostos Orgânicos/toxicidadeRESUMO
A database of environmentally hazardous chemicals, collected and modeled by QSAR by the Insubria group, is included in the updated version of QSARINS, software recently proposed for the development and validation of QSAR models by the genetic algorithm-ordinary least squares method. In this version, a module, named QSARINS-Chem, includes several datasets of chemical structures and their corresponding endpoints (physicochemical properties and biological activities). The chemicals are accessible in different ways (CAS, SMILES, names and so forth) and their three-dimensional structure can be visualized. Some of the QSAR models, previously published by our group, have been redeveloped using the free online software for molecular descriptor calculation, PaDEL-Descriptor. The new models can be easily applied for future predictions on chemicals without experimental data, also verifying the applicability domain to new chemicals. The QSAR model reporting format (QMRF) of these models is also here downloadable. Additional chemometric analyses can be done by principal component analysis and multicriteria decision making for screening and ranking chemicals to prioritize the most dangerous.
RESUMO
The use of New Approach Methodologies (NAMs), such as Quantitative Structure-Activity Relationship (QSAR) models, is highly recommended by international regulations to speed up hazard and risk assessment of Endocrine Disruptors, which are known to be linked to a wide spectrum of severe diseases on humans and wildlife. A very sensitive target for these chemicals is the thyroid hormone system, which plays a key role in regulating metabolic and cognitive functions. Several chemicals have been demonstrated to compete with the thyroid hormone thyroxine (T4) for binding to human thyroid hormone distributor protein transthyretin (hTTR). In this work, we generated three new datasets composed by T4-hTTR competing potencies of more than 200 heterogeneous chemicals measured by three different in vitro assays. These datasets were used for the development of new regression QSAR models. The best models were thoroughly validated by internal and external validation procedures. The mechanistic interpretation of the selected molecular descriptors provided information on structural features which are relevant to characterise hTTR binders, such as the presence of hydroxylated and halogenated aromatic rings. PCA analysis was used to rank the studied chemicals according to their increasing T4-hTTR competing potency. Hydroxylated and halogenated bicyclic aromatic compounds are ranked as the strongest hTTR binders. The new QSARs are useful to screen potential Thyroid Hormone System-Disrupting Chemicals (THSDCs), and to support the identification of sustainable alternatives to hazardous chemicals.
RESUMO
The removal efficiency (RE) of organic contaminants in wastewater treatment plants (WWTPs) is a major determinant of the environmental impact of chemicals which are discharged to wastewater. In a recent study, non-target screening analysis was applied to quantify the percentage removal efficiency (RE%) of more than 300 polar contaminants, by analyzing influent and effluent samples from a Swedish WWTP with direct injection UHPLC-Orbitrap-MS/MS. Based on subsets extracted from these data, we developed quantitative structure-property relationships (QSPRs) for the prediction of WWTP breakthrough (BT) to the effluent water. QSPRs were developed by means of multiple linear regression (MLR) and were selected after checking for overfitting and chance relationships by means of bootstrap and randomization procedures. A first model provided good fitting performance, showing that the proposed approach for the development of QSPRs for the prediction of BT is reasonable. By further populating the dataset with similar chemicals using a Tanimoto index approach based on substructure count fingerprints, a second QSPR indicated that the prediction of BT is also applicable to new chemicals sufficiently similar to the training set. Finally, a class-specific QSPR for PEGs and PPGs showed BT prediction trends consistent with known degradation pathways.
Assuntos
Poluentes Químicos da Água , Purificação da Água , Espectrometria de Massas em Tandem , Poluentes Químicos da Água/análise , Monitoramento Ambiental/métodos , Águas Residuárias , Purificação da Água/métodos , Eliminação de Resíduos Líquidos/métodosRESUMO
Xenobiotics released in the environment can be taken up by aquatic and terrestrial organisms and can accumulate at higher concentrations through the trophic chain. Bioaccumulation is therefore one of the PBT properties that authorities require to assess for the evaluation of the risks that chemicals may pose to humans and the environment. The use of an integrated testing strategy (ITS) and the use of multiple sources of information are strongly encouraged by authorities in order to maximize the information available and reduce testing costs. Moreover, considering the increasing demand for development and the application of new approaches and alternatives to animal testing, the development of in silico cost-effective tools such as QSAR models becomes increasingly important. In this study, a large and curated literature database of fish laboratory-based values of dietary biomagnification factor (BMF) was used to create externally validated QSARs. The quality categories (high, medium, low) available in the database were used to extract reliable data to train and validate the models, and to further address the uncertainty in low-quality data. This procedure was useful for highlighting problematic compounds for which additional experimental effort would be required, such as siloxanes, highly brominated and chlorinated compounds. Two models were suggested as final outputs in this study, one based on good-quality data and the other developed on a larger dataset of consistent Log BMFL values, which included lower-quality data. The models had similar predictive ability; however, the second model had a larger applicability domain. These QSARs were based on simple MLR equations that could easily be applied for the predictions of dietary BMFL in fish, and support bioaccumulation assessment procedures at the regulatory level. To ease the application and dissemination of these QSARs, they were included with technical documentation (as QMRF Reports) in the QSAR-ME Profiler software for QSAR predictions available online.
RESUMO
The evaluation of regression QSAR model performance, in fitting, robustness, and external prediction, is of pivotal importance. Over the past decade, different external validation parameters have been proposed: Q(F1)(2), Q(F2)(2), Q(F3)(2), r(m)(2), and the Golbraikh-Tropsha method. Recently, the concordance correlation coefficient (CCC, Lin), which simply verifies how small the differences are between experimental data and external data set predictions, independently of their range, was proposed by our group as an external validation parameter for use in QSAR studies. In our preliminary work, we demonstrated with thousands of simulated models that CCC is in good agreement with the compared validation criteria (except r(m)(2)) using the cutoff values normally applied for the acceptance of QSAR models as externally predictive. In this new work, we have studied and compared the general trends of the various criteria relative to different possible biases (scale and location shifts) in external data distributions, using a wide range of different simulated scenarios. This study, further supported by visual inspection of experimental vs predicted data scatter plots, has highlighted problems related to some criteria. Indeed, if based on the cutoff suggested by the proponent, r(m)(2) could also accept not predictive models in two of the possible biases (location, location plus scale), while in the case of scale shift bias, it appears to be the most restrictive. Moreover, Q(F1)(2) and Q(F2)(2) showed some problems in one of the possible biases (scale shift). This analysis allowed us to also propose recalibrated, and intercomparable for the same data scatter, new thresholds for each criterion in defining a QSAR model as really externally predictive in a more precautionary approach. An analysis of the results revealed that the scatter plot of experimental vs predicted external data must always be evaluated to support the statistical criteria values: in some cases high statistical parameter values could hide models with unacceptable predictions.
RESUMO
The bioconcentration factor (BCF) is one of the metrics used to evaluate the potential of a substance to bioaccumulate into aquatic organisms. In this work, linear and non-linear regression QSARs were developed for the prediction of log BCF using different computational approaches, and starting from a large and structurally heterogeneous dataset. The new MLR-OLS and ANN regression models have good fitting with R2 values of 0.62 and 0.70, respectively, and comparable external predictivity with R2ext 0.64 and 0.65 (RMSEext of 0.78 and 0.76), respectively. Furthermore, linear and non-linear classification models were developed using the regulatory threshold BCF >2000. A class balanced subset was used to develop classification models which were applied to chemicals not used to create the QSARs. These classification models are characterized by external and internal accuracy up to 84% and 90%, respectively, and sensitivity and specificity up to 90% and 80%, respectively. QSARs presented in this work are validated according to regulatory requirements and their quality is in line with other tools available for the same endpoint and dataset, with the advantage of low complexity and easy application through the software QSAR-ME Profiler. These QSARs can be used as alternatives for, or in combination with, existing models to support bioaccumulation assessment procedures.
RESUMO
Accurate estimates of virus mutation rates are important to understand the evolution of the viruses and to combat them. However, methods of estimation are varied and often complex. Here, we critically review over 40 original studies and establish criteria to facilitate comparative analyses. The mutation rates of 23 viruses are presented as substitutions per nucleotide per cell infection (s/n/c) and corrected for selection bias where necessary, using a new statistical method. The resulting rates range from 10(-8) to 10(-6) s/n/c for DNA viruses and from 10(-6) to 10(-4) s/n/c for RNA viruses. Similar to what has been shown previously for DNA viruses, there appears to be a negative correlation between mutation rate and genome size among RNA viruses, but this result requires further experimental testing. Contrary to some suggestions, the mutation rate of retroviruses is not lower than that of other RNA viruses. We also show that nucleotide substitutions are on average four times more common than insertions/deletions (indels). Finally, we provide estimates of the mutation rate per nucleotide per strand copying, which tends to be lower than that per cell infection because some viruses undergo several rounds of copying per cell, particularly double-stranded DNA viruses. A regularly updated virus mutation rate data set will be available at www.uv.es/rsanjuan/virmut.
Assuntos
Evolução Molecular , Mutação , Vírus/genética , Animais , Vírus de DNA/genética , Humanos , Modelos Genéticos , Vírus de RNA/genética , Fatores de TempoRESUMO
The main utility of QSAR models is their ability to predict activities/properties for new chemicals, and this external prediction ability is evaluated by means of various validation criteria. As a measure for such evaluation the OECD guidelines have proposed the predictive squared correlation coefficient Q(2)(F1) (Shi et al.). However, other validation criteria have been proposed by other authors: the Golbraikh-Tropsha method, r(2)(m) (Roy), Q(2)(F2) (Schüürmann et al.), Q(2)(F3) (Consonni et al.). In QSAR studies these measures are usually in accordance, though this is not always the case, thus doubts can arise when contradictory results are obtained. It is likely that none of the aforementioned criteria is the best in every situation, so a comparative study using simulated data sets is proposed here, using threshold values suggested by the proponents or those widely used in QSAR modeling. In addition, a different and simple external validation measure, the concordance correlation coefficient (CCC), is proposed and compared with other criteria. Huge data sets were used to study the general behavior of validation measures, and the concordance correlation coefficient was shown to be the most restrictive. On using simulated data sets of a more realistic size, it was found that CCC was broadly in agreement, about 96% of the time, with other validation measures in accepting models as predictive, and in almost all the examples it was the most precautionary. The proposed concordance correlation coefficient also works well on real data sets, where it seems to be more stable, and helps in making decisions when the validation measures are in conflict. Since it is conceptually simple, and given its stability and restrictiveness, we propose the concordance correlation coefficient as a complementary, or alternative, more prudent measure of a QSAR model to be externally predictive.
Assuntos
Modelos Moleculares , Relação Quantitativa Estrutura-AtividadeRESUMO
The genomes of most virus species have overlapping genes--two or more proteins coded for by the same nucleotide sequence. Several explanations have been proposed for the evolution of this phenomenon, and we test these by comparing the amount of gene overlap in all known virus species. We conclude that gene overlap is unlikely to have evolved as a way of compressing the genome in response to the harmful effect of mutation because RNA viruses, despite having generally higher mutation rates, have less gene overlap on average than DNA viruses of comparable genome length. However, we do find a negative relationship between overlap proportion and genome length among viruses with icosahedral capsids, but not among those with other capsid types that we consider easier to enlarge in size. Our interpretation is that a physical constraint on genome length by the capsid has led to gene overlap evolving as a mechanism for producing more proteins from the same genome length. We consider that these patterns cannot be explained by other factors, namely the possible roles of overlap in transcription regulation, generating more divergent proteins and the relationship between gene length and genome length.
Assuntos
Capsídeo/fisiologia , Vírus de DNA/genética , Homologia de Genes , Genoma Viral , Vírus de RNA/genética , Vírus de DNA/fisiologia , Vírus de RNA/fisiologiaRESUMO
Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a high-quality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.
Assuntos
Evolução Molecular , Homologia de Genes/genética , Proteínas/genética , Sequência de Aminoácidos/genética , Animais , Genes Virais/genética , Mamíferos/genética , Mutação , Fases de Leitura Aberta/genética , Análise de Componente PrincipalRESUMO
BACKGROUND, AIM, AND SCOPE: The widespread use of some platinum group elements as catalysts to minimize emission of pollutants from combustion engines produced a constantly growing increase of the concentration of these elements in the environment; their potential toxicological properties explain the increasing interest in routine easy monitoring. We have found that leaves of Prunus laurus cerasus are efficient collectors of particulate with a dimension <60-80 mum, and a simple and reliable procedure was developed to reveal traces of platinum, palladium, and rhodium released from automotive catalysts. The analysis of the dust deposited on the foliage is a direct indicator of traffic pollution. MATERIALS AND METHODS: Leaves of P. laurus cerasus were washed by sonication in a mixture of water and 2-propanol and the washings, to be discarded, were separated by centrifugation to yield typically 0.05-1.2 g of dust that, after mineralization, was directly submitted for atomic absorption analysis. RESULTS: Comparison of the 2007 and 2004-2005 results showed a dramatic reduction of the platinum levels and revealed that palladium is now the main component of this traffic-related pollution. DISCUSSION: The results are consistent with the increasing diffusion of cars with a diesel engine whose catalysts are made up of Pt and/or Pd alone, and gives a significant insight into the recent evolution in catalyst design that replaces platinum for palladium. CONCLUSIONS: The proposed analytical procedure is simple, with short preparation times, and greatly reduces matrix effects so that atomic absorption spectroscopy can easily detect the three noble metals at the ng/g level in the dust. RECOMMENDATION AND PERSPECTIVES: The results clearly show that Pd concentrations have increased over time, and must be cause for concern.