RESUMEN
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this devastating pandemic and more than 130 000 COVID-19-related research papers have been published in peer-reviewed journals or deposited in preprint servers. Much of the research effort has focused on the discovery of novel drug candidates or repurposing of existing drugs against COVID-19, and many such projects have been either exclusively computational or computer-aided experimental studies. Herein, we provide an expert overview of the key computational methods and their applications for the discovery of COVID-19 small-molecule therapeutics that have been reported in the research literature. We further outline that, after the first year the COVID-19 pandemic, it appears that drug repurposing has not produced rapid and global solutions. However, several known drugs have been used in the clinic to cure COVID-19 patients, and a few repurposed drugs continue to be considered in clinical trials, along with several novel clinical candidates. We posit that truly impactful computational tools must deliver actionable, experimentally testable hypotheses enabling the discovery of novel drugs and drug combinations, and that open science and rapid sharing of research results are critical to accelerate the development of novel, much needed therapeutics for COVID-19.
Asunto(s)
Tratamiento Farmacológico de COVID-19 , Simulación por Computador , Diseño de Fármacos , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos , Antivirales/uso terapéutico , COVID-19/virología , Ensayos Clínicos como Asunto , Humanos , Pandemias , SARS-CoV-2/efectos de los fármacosRESUMEN
Simplified molecular input line entry system (SMILES)-based deep learning models are slowly emerging as an important research topic in cheminformatics. In this study, we introduce SMILES pair encoding (SPE), a data-driven tokenization algorithm. SPE first learns a vocabulary of high-frequency SMILES substrings from a large chemical dataset (e.g., ChEMBL) and then tokenizes SMILES based on the learned vocabulary for the actual training of deep learning models. SPE augments the widely used atom-level tokenization by adding human-readable and chemically explainable SMILES substrings as tokens. Case studies show that SPE can achieve superior performances on both molecular generation and quantitative structure-activity relationship (QSAR) prediction tasks. In particular, the SPE-based generative models outperformed the atom-level tokenization model in the aspects of novelty, diversity, and ability to resemble the training set distribution. The performance of SPE-based QSAR prediction models were evaluated using 24 benchmark datasets where SPE consistently either did match or outperform atom-level and k-mer tokenization. Therefore, SPE could be a promising tokenization method for SMILES-based deep learning models. An open-source Python package SmilesPE was developed to implement this algorithm and is now freely available at https://github.com/XinhaoLi74/SmilesPE.
Asunto(s)
Aprendizaje Profundo , Algoritmos , Quimioinformática , Humanos , Relación Estructura-Actividad CuantitativaRESUMEN
Well-known 4-hydroxycoumarin derivatives, such as warfarin, act as inhibitors of the vitamin K epoxide reductase (VKOR) and are used as anticoagulants. Mutations of the VKOR enzyme can lead to resistance to those compounds. This has been a problem in using them as medicine or rodenticide. Most of these mutations lie in the vicinity of potential warfarin-binding sites within the ER-luminal loop structure (Lys30, Phe55) and the transmembrane helix (Tyr138). However, a VKOR mutation found in Tokyo in warfarin-resistant rats does not follow that pattern (Leu76Pro), and its effect on VKOR function and structure remains unclear. We conducted both in vitro kinetic analyses and in silico docking studies to characterize the VKOR mutant. On the one hand, resistant rats (R-rats) showed a 37.5-fold increased IC50 value to warfarin when compared to susceptible rats (S-rats); on the other hand, R-rats showed a 16.5-fold lower basal VKOR activity (Vmax/Km). Docking calculations exhibited that the mutated VKOR of R-rats has a decreased affinity for warfarin. Molecular dynamics simulations further revealed that VKOR-associated warfarin was more exposed to solvents in R-rats and key interactions between Lys30, Phe55, and warfarin were less favored. This study concludes that a single mutation of VKOR at position 76 leads to a significant resistance to warfarin by modifying the types and numbers of intermolecular interactions between the two.
Asunto(s)
Rodenticidas , Warfarina , Animales , Resistencia a Medicamentos/genética , Mutación , Ratas , Rodenticidas/toxicidad , Vitamina K Epóxido Reductasas/genética , Warfarina/farmacologíaRESUMEN
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Asunto(s)
Química Farmacéutica/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/metabolismo , Preparaciones Farmacéuticas/química , Algoritmos , Animales , Inteligencia Artificial , Bases de Datos Factuales , Diseño de Fármacos , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Teoría Cuántica , Reproducibilidad de los ResultadosRESUMEN
Correction for 'QSAR without borders' by Eugene N. Muratov et al., Chem. Soc. Rev., 2020, DOI: 10.1039/d0cs00098a.
RESUMEN
Reliable in silico approaches to replace animal testing for the evaluation of potential acute toxic effects are highly demanded by regulatory agencies. In particular, quantitative structure-activity relationship (QSAR) models have been used to rapidly assess chemical induced toxicity using either continuous (regression) or discrete (classification) predictions. However, it is often unclear how those different types of models can complement and potentially help each other to afford the best prediction accuracy for a given chemical. This paper presents a novel, dual-layer hierarchical modeling method to fully integrate regression and classification QSAR models for assessing rat acute oral systemic toxicity, with respect to regulatory classifications of concern. The first layer of independent regression, binary, and multiclass models (base models) were solely built using computed chemical descriptors/fingerprints. Then, a second layer of models (hierarchical models) were built by stacking all the cross-validated out-of-fold predictions from the base models. All models were validated using an external test set, and we found that the hierarchical models did outperform the base models for all three end points. The hierarchical quantitative structure-activity relationship (H-QSAR) modeling method represents a promising approach for chemical toxicity prediction and more generally for stacking and blending individual QSAR models into more predictive ensemble models.
Asunto(s)
Compuestos Orgánicos/toxicidad , Relación Estructura-Actividad Cuantitativa , Administración Oral , Algoritmos , Animales , Modelos Moleculares , Estructura Molecular , Compuestos Orgánicos/administración & dosificación , Ratas , Análisis de RegresiónRESUMEN
Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.
Asunto(s)
Biología Computacional/métodos , Estudios de Asociación Genética/métodos , Análisis de Secuencia de ADN/métodos , Proteína 4 Similar a la Angiopoyetina/genética , Proteínas de Transferencia de Ésteres de Colesterol/genética , Simulación por Computador , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Humanos , Modelos Genéticos , Proproteína Convertasa 9/genética , Estructura Terciaria de Proteína , Factores de RiesgoRESUMEN
Since its inception, the main goal of the lipidomics field has been to characterize lipid species and their respective biological roles. However, difficulties in both full speciation and biological interpretation have rendered these objectives extremely challenging and as a result, limited our understanding of lipid mechanisms and dysregulation. While mass spectrometry-based advancements have significantly increased the ability to identify lipid species, less progress has been made surrounding biological interpretations. We have therefore developed a Structural-based Connectivity and Omic Phenotype Evaluations (SCOPE) cheminformatics toolbox to aid in these evaluations. SCOPE enables the assessment and visualization of two main lipidomic associations: structure/biological connections and metadata linkages either separately or in tandem. To assess structure and biological relationships, SCOPE utilizes key lipid structural moieties such as head group and fatty acyl composition and links them to their respective biological relationships through hierarchical clustering and grouped heatmaps. Metadata arising from phenotypic and environmental factors such as age and diet is then correlated with the lipid structures and/or biological relationships, utilizing Toxicological Prioritization Index (ToxPi) software. Here, SCOPE is demonstrated for various applications from environmental studies to clinical assessments to showcase new biological connections not previously observed with other techniques.
Asunto(s)
Quimioinformática , Lipidómica , Lípidos , Espectrometría de Masas , FenotipoRESUMEN
Imatinib, a 2-phenylaminopyridine-based BCR-ABL tyrosine kinase inhibitor, is a highly effective drug for treating Chronic Myeloid Leukemia (CML). However, cases of drug resistance are constantly emerging due to various mutations in the ABL kinase domain; thus, it is crucial to identify novel bioactive analogues. Reliable QSAR models and molecular docking protocols have been shown to facilitate the discovery of new compounds from chemical libraries prior to experimental testing. However, as the vast majority of QSAR models strictly relies on 2D descriptors, the rise of 3D descriptors directly computed from molecular dynamics simulations offers new opportunities to potentially augment the reliability of QSAR models. Herein, we employed molecular docking and molecular dynamics on a large series of Imatinib derivatives and developed an ensemble of QSAR models relying on deep neural nets (DNN) and hybrid sets of 2D/3D/MD descriptors in order to predict the binding affinity and inhibition potencies of those compounds. Through rigorous validation tests, we showed that our DNN regression models achieved excellent external prediction performances for the pKi data set (n = 555, R2 ≥ 0.71. and MAE ≤ 0.85), and the pIC50 data set (n = 306, R2 ≥ 0.54. and MAE ≤ 0.71) with strict validation protocols based on external test sets and 10-fold native and nested cross validations. Interestingly, the best DNN and random forest models performed similarly across all descriptor sets. In fact, for this particular series of compounds, our external test results suggest that incorporating additional 3D protein-ligand binding site fingerprint, descriptors, or even MD time-series descriptors did not significantly improve the overall R2 but lowered the MAE of DNN QSAR models. Those augmented models could still help in identifying and understanding the key dynamic protein-ligand interactions to be optimized for further molecular design.
Asunto(s)
Benchmarking , Relación Estructura-Actividad Cuantitativa , Mesilato de Imatinib/farmacología , Simulación del Acoplamiento Molecular , Reproducibilidad de los ResultadosRESUMEN
Perfluoroalkyl and polyfluoroalkyl substances (PFASs) pose a substantial threat as endocrine disruptors, and thus early identification of those that may interact with steroid hormone receptors, such as the androgen receptor (AR), is critical. In this study we screened 5,206 PFASs from the CompTox database against the different binding sites on the AR using both molecular docking and machine learning techniques. We developed support vector machine models trained on Tox21 data to classify the active and inactive PFASs for AR using different chemical fingerprints as features. The maximum accuracy was 95.01% and Matthew's correlation coefficient (MCC) was 0.76 respectively, based on MACCS fingerprints (MACCSFP). The combination of docking-based screening and machine learning models identified 29 PFASs that have strong potential for activity against the AR and should be considered priority chemicals for biological toxicity testing.
Asunto(s)
Disruptores Endocrinos , Fluorocarburos , Disruptores Endocrinos/análisis , Disruptores Endocrinos/toxicidad , Fluorocarburos/toxicidad , Aprendizaje Automático , Tamizaje Masivo , Simulación del Acoplamiento Molecular , Receptores AndrogénicosRESUMEN
Motivation: Easily navigating chemical space has become more important due to the increasing size and diversity of publicly-accessible databases such as DrugBank, ChEMBL or Tox21. To do so, modelers typically rely on complex projection techniques using molecular descriptors computed for all the chemicals to be visualized. However, the multiple cheminformatics steps required to prepare, characterize, compute and explore those molecules, are technical, typically necessitate scripting skills, and thus represent a real obstacle for non-specialists. Results: We developed the ChemMaps.com webserver to easily browse, navigate and mine chemical space. The first version of ChemMaps.com features more than 8000 approved, in development, and rejected drugs, as well as over 47 000 environmental chemicals. Availability and implementation: The webserver is freely available at http://www.chemmaps.com.
Asunto(s)
Biología Computacional , Bases de Datos Farmacéuticas , Estructura Molecular , Programas Informáticos , Navegador WebRESUMEN
Ion mobility spectrometry (IMS) is a widely used analytical technique providing rapid gas phase separations. IMS alone is useful, but its coupling with mass spectrometry (IMS-MS) and various front-end separation techniques has greatly increased the molecular information achievable from different omic analyses. IMS-MS analyses are specifically gaining attention for improving metabolomic, lipidomic, glycomic, proteomic and exposomic analyses by increasing measurement sensitivity (e.g. S/N ratio), reducing the detection limit, and amplifying peak capacity. Numerous studies including national security-related analyses, disease screenings and environmental evaluations are illustrating that IMS-MS is able to extract information not possible with MS alone. Furthermore, IMS-MS has shown great utility in salvaging molecular information for low abundance molecules of interest when high concentration contaminant ions are present in the sample by reducing detector suppression. This review highlights how IMS-MS is currently being used in omic analyses to distinguish structurally similar molecules, isomers, molecular classes and contaminant ions.
RESUMEN
MOTIVATION: There is a growing interest for the broad use of Augmented Reality (AR) and Virtual Reality (VR) in the fields of bioinformatics and cheminformatics to visualize complex biological and chemical structures. AR and VR technologies allow for stunning and immersive experiences, offering untapped opportunities for both research and education purposes. However, preparing 3D models ready to use for AR and VR is time-consuming and requires a technical expertise that severely limits the development of new contents of potential interest for structural biologists, medicinal chemists, molecular modellers and teachers. RESULTS: Herein we present the RealityConvert software tool and associated website, which allow users to easily convert molecular objects to high quality 3D models directly compatible for AR and VR applications. For chemical structures, in addition to the 3D model generation, RealityConvert also generates image trackers, useful to universally call and anchor that particular 3D model when used in AR applications. The ultimate goal of RealityConvert is to facilitate and boost the development and accessibility of AR and VR contents for bioinformatics and cheminformatics applications. AVAILABILITY AND IMPLEMENTATION: http://www.realityconvert.com. CONTACT: dfourch@ncsu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Conformación Molecular , Programas Informáticos , Realidad Virtual , Modelos MolecularesRESUMEN
Quantitative structure-activity relationships (QSAR) models are often seen as a "black box" because they are considered difficult to interpret. Meanwhile, qualitative approaches, e.g., structural alerts (SA) or read-across, provide mechanistic insight, which is preferred for regulatory purposes, but predictive accuracy of such approaches is often low. Herein, we introduce the chemistry-wide association study (CWAS) approach, a novel framework that both addresses such deficiencies and combines advantages of statistical QSAR and alert-based approaches. The CWAS framework consists of the following steps: (i) QSAR model building for an end point of interest, (ii) identification of key chemical features, (iii) determination of communities of such features disproportionately co-occurring more frequently in the active than in the inactive class, and (iv) assembling these communities to form larger (and not necessarily chemically connected) novel structural alerts with high specificity. As a proof-of-concept, we have applied CWAS to model Ames mutagenicity and Stevens-Johnson Syndrome (SJS). For the well-studied Ames mutagenicity data set, we identified 76 important individual fragments and assembled co-occurring fragments into SA both replicative of known as well as representing novel mutagenicity alerts. For the SJS data set, we identified 29 important fragments and assembled co-occurring communities into SA including both known and novel alerts. In summary, we demonstrate that CWAS provides a new framework to interpret predictive QSAR models and derive refined structural alerts for more effective design and safety assessment of drugs and drug candidates.
Asunto(s)
Descubrimiento de Drogas/métodos , Pruebas de Mutagenicidad/métodos , Preparaciones Farmacéuticas/química , Relación Estructura-Actividad Cuantitativa , Síndrome de Stevens-Johnson/etiología , Bases de Datos Factuales , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/etiología , Humanos , Modelos BiológicosRESUMEN
Quantitative Structure-Activity Relationship (QSAR) models typically rely on 2D and 3D molecular descriptors to characterize chemicals and forecast their experimental activities. Previously, we showed that even the most reliable 2D QSAR models and structure-based 3D molecular docking techniques were not capable of accurately ranking a set of known inhibitors for the ERK2 kinase, a key player in various types of cancer. Herein, we calculated and analyzed a series of chemical descriptors computed from the molecular dynamics (MD) trajectories of ERK2-ligand complexes. First, the docking of 87 ERK2 ligands with known binding affinities was accomplished using Schrodinger's Glide software; then, solvent-explicit MD simulations (20 ns, NPT, 300 K, TIP3P, 1 fs) were performed using the GPU-accelerated Desmond program. Second, we calculated a series of MD descriptors based on the distributions of 3D descriptors computed for representative samples of the ligand's conformations over the MD simulations. Third, we analyzed the data set of 87 inhibitors in the MD chemical descriptor space. We showed that MD descriptors (i) had little correlation with conventionally used 2D/3D descriptors, (ii) were able to distinguish the most active ERK2 inhibitors from the moderate/weak actives and inactives, and (iii) provided key and complementary information about the unique characteristics of active ligands. This study represents the largest attempt to utilize MD-extracted chemical descriptors to characterize and model a series of bioactive molecules. MD descriptors could enable the next generation of hyperpredictive MD-QSAR models for computer-aided lead optimization and analogue prioritization.
Asunto(s)
Proteína Quinasa 1 Activada por Mitógenos/antagonistas & inhibidores , Simulación de Dinámica Molecular , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Ligandos , Proteína Quinasa 1 Activada por Mitógenos/química , Proteína Quinasa 1 Activada por Mitógenos/metabolismo , Conformación Proteica , Inhibidores de Proteínas Quinasas/metabolismo , Relación Estructura-Actividad Cuantitativa , Solventes/química , TemperaturaRESUMEN
Severe adverse drug reactions (ADRs) are the fourth leading cause of fatality in the U.S. with more than 100,000 deaths per year. As up to 30% of all ADRs are believed to be caused by drug-drug interactions (DDIs), typically mediated by cytochrome P450s, possibilities to predict DDIs from existing knowledge are important. We collected data from public sources on 1485, 2628, 4371, and 27,966 possible DDIs mediated by four cytochrome P450 isoforms 1A2, 2C9, 2D6, and 3A4 for 55, 73, 94, and 237 drugs, respectively. For each of these data sets, we developed and validated QSAR models for the prediction of DDIs. As a unique feature of our approach, the interacting drug pairs were represented as binary chemical mixtures in a 1:1 ratio. We used two types of chemical descriptors: quantitative neighborhoods of atoms (QNA) and simplex descriptors. Radial basis functions with self-consistent regression (RBF-SCR) and random forest (RF) were utilized to build QSAR models predicting the likelihood of DDIs for any pair of drug molecules. Our models showed balanced accuracy of 72-79% for the external test sets with a coverage of 81.36-100% when a conservative threshold for the model's applicability domain was applied. We generated virtually all possible binary combinations of marketed drugs and employed our models to identify drug pairs predicted to be instances of DDI. More than 4500 of these predicted DDIs that were not found in our training sets were confirmed by data from the DrugBank database.
Asunto(s)
Algoritmos , Sistema Enzimático del Citocromo P-450/química , Sistema Enzimático del Citocromo P-450/metabolismo , Interacciones Farmacológicas , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Bases de Datos Factuales , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Modelos BiológicosRESUMEN
There is a growing public concern about the lack of reproducibility of experimental data published in peer-reviewed scientific literature. Herein, we review the most recent alerts regarding experimental data quality and discuss initiatives taken thus far to address this problem, especially in the area of chemical genomics. Going beyond just acknowledging the issue, we propose a chemical and biological data curation workflow that relies on existing cheminformatics approaches to flag, and when appropriate, correct possibly erroneous entries in large chemogenomics data sets. We posit that the adherence to the best practices for data curation is important for both experimental scientists who generate primary data and deposit them in chemical genomics databases and computational researchers who rely on these data for model development.
Asunto(s)
Genómica , Estadística como Asunto/normas , Consenso , Control de Calidad , Relación Estructura-Actividad Cuantitativa , Reproducibilidad de los ResultadosRESUMEN
SUMMARY: We report on the development of the high-throughput screening (HTS) Navigator software to analyze and visualize the results of HTS of chemical libraries. The HTS Navigator processes output files from different plate readers' formats, computes the overall HTS matrix, automatically detects hits and has different types of baseline navigation and correction features. The software incorporates advanced cheminformatics capabilities such as chemical structure storage and visualization, fast similarity search and chemical neighborhood analysis for retrieved hits. The software is freely available for academic laboratories. AVAILABILITY AND IMPLEMENTATION: http://fourches.web.unc.edu/
Asunto(s)
Biología Computacional , Ensayos Analíticos de Alto Rendimiento , Almacenamiento y Recuperación de la Información , Bibliotecas de Moléculas Pequeñas/farmacología , Programas Informáticos , Algoritmos , Bases de Datos de Compuestos Químicos , Descubrimiento de Drogas , Relación Estructura-Actividad CuantitativaRESUMEN
Skin permeability is widely considered to be mechanistically implicated in chemically-induced skin sensitization. Although many chemicals have been identified as skin sensitizers, there have been very few reports analyzing the relationships between molecular structure and skin permeability of sensitizers and non-sensitizers. The goals of this study were to: (i) compile, curate, and integrate the largest publicly available dataset of chemicals studied for their skin permeability; (ii) develop and rigorously validate QSAR models to predict skin permeability; and (iii) explore the complex relationships between skin sensitization and skin permeability. Based on the largest publicly available dataset compiled in this study, we found no overall correlation between skin permeability and skin sensitization. In addition, cross-species correlation coefficient between human and rodent permeability data was found to be as low as R(2)=0.44. Human skin permeability models based on the random forest method have been developed and validated using OECD-compliant QSAR modeling workflow. Their external accuracy was high (Q(2)ext=0.73 for 63% of external compounds inside the applicability domain). The extended analysis using both experimentally-measured and QSAR-imputed data still confirmed the absence of any overall concordance between skin permeability and skin sensitization. This observation suggests that chemical modifications that affect skin permeability should not be presumed a priori to modulate the sensitization potential of chemicals. The models reported herein as well as those developed in the companion paper on skin sensitization suggest that it may be possible to rationally design compounds with the desired high skin permeability but low sensitization potential.
Asunto(s)
Dermatitis Alérgica por Contacto/etiología , Dermatitis Alérgica por Contacto/metabolismo , Sustancias Peligrosas/envenenamiento , Absorción Cutánea/fisiología , Piel/efectos de los fármacos , Piel/metabolismo , Simulación por Computador , Bases de Datos Factuales , Dermatitis Alérgica por Contacto/inmunología , Humanos , Modelos Teóricos , Permeabilidad , Relación Estructura-Actividad Cuantitativa , Piel/inmunología , Programas InformáticosRESUMEN
Repetitive exposure to a chemical agent can induce an immune reaction in inherently susceptible individuals that leads to skin sensitization. Although many chemicals have been reported as skin sensitizers, there have been very few rigorously validated QSAR models with defined applicability domains (AD) that were developed using a large group of chemically diverse compounds. In this study, we have aimed to compile, curate, and integrate the largest publicly available dataset related to chemically-induced skin sensitization, use this data to generate rigorously validated and QSAR models for skin sensitization, and employ these models as a virtual screening tool for identifying putative sensitizers among environmental chemicals. We followed best practices for model building and validation implemented with our predictive QSAR workflow using Random Forest modeling technique in combination with SiRMS and Dragon descriptors. The Correct Classification Rate (CCR) for QSAR models discriminating sensitizers from non-sensitizers was 71-88% when evaluated on several external validation sets, within a broad AD, with positive (for sensitizers) and negative (for non-sensitizers) predicted rates of 85% and 79% respectively. When compared to the skin sensitization module included in the OECD QSAR Toolbox as well as to the skin sensitization model in publicly available VEGA software, our models showed a significantly higher prediction accuracy for the same sets of external compounds as evaluated by Positive Predicted Rate, Negative Predicted Rate, and CCR. These models were applied to identify putative chemical hazards in the Scorecard database of possible skin or sense organ toxicants as primary candidates for experimental validation.