RESUMEN
Studies on virus-host interactions are of high significance for a number of reasons [...].
Asunto(s)
Antivirales , Interacciones Huésped-Patógeno , Antivirales/farmacología , Antivirales/uso terapéutico , Replicación ViralRESUMEN
In vitro cell-line cytotoxicity is widely used in the experimental studies of potential antineoplastic agents and evaluation of safety in drug discovery. In silico estimation of cytotoxicity against hundreds of tumor cell lines and dozens of normal cell lines considerably reduces the time and costs of drug development and the assessment of new pharmaceutical agent perspectives. In 2018, we developed the first freely available web application (CLC-Pred) for the qualitative prediction of cytotoxicity against 278 tumor and 27 normal cell lines based on structural formulas of 59,882 compounds. Here, we present a new version of this web application: CLC-Pred 2.0. It also employs the PASS (Prediction of Activity Spectra for Substance) approach based on substructural atom centric MNA descriptors and a Bayesian algorithm. CLC-Pred 2.0 provides three types of qualitative prediction: (1) cytotoxicity against 391 tumor and 47 normal human cell lines based on ChEMBL and PubChem data (128,545 structures) with a mean accuracy of prediction (AUC), calculated by the leave-one-out (LOO CV) and the 20-fold cross-validation (20F CV) procedures, of 0.925 and 0.923, respectively; (2) cytotoxicity against an NCI60 tumor cell-line panel based on the Developmental Therapeutics Program's NCI60 data (22,726 structures) with different thresholds of IG50 data (100, 10 and 1 nM) and a mean accuracy of prediction from 0.870 to 0.945 (LOO CV) and from 0.869 to 0.942 (20F CV), respectively; (3) 2170 molecular mechanisms of actions based on ChEMBL and PubChem data (656,011 structures) with a mean accuracy of prediction 0.979 (LOO CV) and 0.978 (20F CV). Therefore, CLC-Pred 2.0 is a significant extension of the capabilities of the initial web application.
Asunto(s)
Antineoplásicos , Programas Informáticos , Humanos , Teorema de Bayes , Antineoplásicos/farmacología , Antineoplásicos/química , Prednisona , Línea Celular TumoralRESUMEN
The growing amount of experimental data on chemical objects includes properties of small molecules, results of studies of their interaction with human and animal proteins, and methods of synthesis of organic compounds (OCs). The data obtained can be used to identify the names of OCs automatically, including all possible synonyms and relevant data on the molecular properties and biological activity. Utilization of different synonymic names of chemical compounds allows researchers to increase the completeness of data on their properties available from publications. Enrichment of the data on the names of chemical compounds by information about their possible metabolites can help estimate the biological effects of parent compounds and their metabolites more thoroughly. Therefore, an attempt at automated extraction of the names of parent compounds and their metabolites from the texts is a rather important task. In our study, we aimed at developing a method that provides the extraction of the named entities (NEs) of parent compounds and their metabolites from abstracts of scientific publications. Based on the application of the conditional random fields' algorithm, we extracted the NEs of chemical compounds. We developed a set of rules allowing identification of parent compound NEs and their metabolites in the texts. We evaluated the possibility of extracting the names of potential metabolites based on cosine similarity between strings representing names of parent compounds and all other chemical NEs found in the text. Additionally, we used conditional random fields to fetch the names of parent compounds and their metabolites from the texts based on the corpus of texts labeled manually. Our computational experiments showed that usage of rules in combination with cosine similarity could increase the accuracy of recognition of the names of metabolites compared to the rule-based algorithm and application of a machine-learning algorithm (conditional random fields).
Asunto(s)
Algoritmos , Proteínas , Animales , Humanos , Aprendizaje AutomáticoRESUMEN
A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.
Asunto(s)
Minería de Datos/métodos , Descubrimiento de Drogas/métodos , Bases de Datos Factuales , Infecciones por VIH/tratamiento farmacológico , Transcriptasa Inversa del VIH/antagonistas & inhibidores , VIH-1/efectos de los fármacos , VIH-1/enzimología , Humanos , PubMed , Inhibidores de la Transcriptasa Inversa/farmacologíaRESUMEN
Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.
Asunto(s)
Bases de Datos Farmacéuticas , Transcriptasa Inversa del VIH/antagonistas & inhibidores , VIH-1/enzimología , Modelos Estadísticos , Relación Estructura-Actividad Cuantitativa , Inhibidores de la Transcriptasa Inversa/química , Inhibidores de la Transcriptasa Inversa/farmacología , Algoritmos , Descubrimiento de Drogas , Farmacorresistencia Viral , VIH-1/efectos de los fármacosRESUMEN
Drug resistance of pathogens, including viruses, is one of the reasons for decreased efficacy of therapy. Considering the impact of HIV type 1 (HIV-1) on the development of progressive immune dysfunction and the rapid development of drug resistance, the analysis of HIV-1 resistance is of high significance. Currently, a substantial amount of data has been accumulated on HIV-1 drug resistance that can be used to build both qualitative and quantitative models of HIV-1 drug resistance. Quantitative models of drug resistance can enrich the information about the efficacy of a particular drug in the scheme of antiretroviral therapy. In our study, we investigated the possibility of developing models for quantitative prediction of HIV-1 resistance to eight protease inhibitors based on the analysis of amino acid sequences of HIV-1 protease for 900 virus variants. We developed random forest regression (RFR), support vector regression (SVR), and self-consistent regression (SCR) models using binary vectors containing values from 0 or 1, depending on the presence of a specific peptide fragment in each amino acid sequence as independent variables, while fold ratio, reflecting the level of resistance, was the predicted variable. The SVR and SCR models showed the highest predictive performances. The models built demonstrate reasonable performances for eight out of nine (R2 varied from 0.828 to 0.909) protease inhibitors, while R2 for predicting tipranavir fold ratio was lower (R2 was 0.642). We believe that the developed approach can be applied to evaluate drug resistance of molecular targets of other viruses where appropriate experimental data are available.
Asunto(s)
Farmacorresistencia Viral , Infecciones por VIH , Inhibidores de la Proteasa del VIH , Proteasa del VIH , VIH-1 , VIH-1/efectos de los fármacos , VIH-1/genética , Farmacorresistencia Viral/genética , Humanos , Infecciones por VIH/virología , Infecciones por VIH/tratamiento farmacológico , Proteasa del VIH/genética , Proteasa del VIH/metabolismo , Inhibidores de la Proteasa del VIH/farmacología , Inhibidores de la Proteasa del VIH/uso terapéutico , Secuencia de Aminoácidos , Fármacos Anti-VIH/farmacología , Fármacos Anti-VIH/uso terapéuticoRESUMEN
Introduction: There are difficulties in creating direct antiviral drugs for all viruses, including new, suddenly arising infections, such as COVID-19. Therefore, pathogenesis-directed therapy is often necessary to treat severe viral infections and comorbidities associated with them. Despite significant differences in the etiopathogenesis of viral diseases, in general, they are associated with significant dysfunction of the immune system. Study of common mechanisms of immune dysfunction caused by different viral infections can help develop novel therapeutic strategies to combat infections and associated comorbidities. Methods: To identify common mechanisms of immune functions disruption during infection by nine different viruses (cytomegalovirus, Ebstein-Barr virus, human T-cell leukemia virus type 1, Hepatitis B and C viruses, human immunodeficiency virus, Dengue virus, SARS-CoV, and SARS-CoV-2), we analyzed the corresponding transcription profiles from peripheral blood mononuclear cells (PBMC) using the originally developed pipeline that include transcriptome data collection, processing, normalization, analysis and search for master regulators of several viral infections. The ten datasets containing transcription data from patients infected by nine viruses and healthy people were obtained from Gene Expression Omnibus. The analysis of the data was performed by Genome Enhancer pipeline. Results: We revealed common pathways, cellular processes, and master regulators for studied viral infections. We found that all nine viral infections cause immune activation, exhaustion, cell proliferation disruption, and increased susceptibility to apoptosis. Using network analysis, we identified PBMC receptors, representing proteins at the top of signaling pathways that may be responsible for the observed transcriptional changes and maintain the current functional state of cells. Discussion: The identified relationships between some of them and virus-induced alteration of immune functions are new and have not been found earlier, e.g., receptors for autocrine motility factor, insulin, prolactin, angiotensin II, and immunoglobulin epsilon. Modulation of the identified receptors can be investigated as one of therapeutic strategies for the treatment of severe viral infections.
Asunto(s)
COVID-19 , Virus , Humanos , Leucocitos Mononucleares , Transcriptoma , Antivirales/farmacología , InmunidadRESUMEN
Predicting viral drug resistance is a significant medical concern. The importance of this problem stimulates the continuous development of experimental and new computational approaches. The use of computational approaches allows researchers to increase therapy effectiveness and reduce the time and expenses involved when the prescribed antiretroviral therapy is ineffective in the treatment of infection caused by the human immunodeficiency virus type 1 (HIV-1). We propose two machine learning methods and the appropriate models for predicting HIV drug resistance related to amino acid substitutions in HIV targets: (i) k-mers utilizing the random forest and the support vector machine algorithms of the scikit-learn library, and (ii) multi-n-grams using the Bayesian approach implemented in MultiPASSR software. Both multi-n-grams and k-mers were computed based on the amino acid sequences of HIV enzymes: reverse transcriptase and protease. The performance of the models was estimated by five-fold cross-validation. The resulting classification models have a relatively high reliability (minimum accuracy for the drugs is 0.82, maximum: 0.94) and were used to create a web application, HVR (HIV drug Resistance), for the prediction of HIV drug resistance to protease inhibitors and nucleoside and non-nucleoside reverse transcriptase inhibitors based on the analysis of the amino acid sequences of the appropriate HIV proteins from clinical samples.
Asunto(s)
Fármacos Anti-VIH , Infecciones por VIH , Humanos , Fármacos Anti-VIH/farmacología , Fármacos Anti-VIH/uso terapéutico , Teorema de Bayes , Sustitución de Aminoácidos , Reproducibilidad de los Resultados , Transcriptasa Inversa del VIH/genética , Inhibidores de la Transcriptasa Inversa/farmacología , Infecciones por VIH/tratamiento farmacológico , Farmacorresistencia Viral/genética , Proteasa del VIH/genéticaRESUMEN
HIV-1 integrase (IN) plays an important role in the life cycle of HIV and is responsible for integration of the virus into the human genome. We present computational approaches used to design novel HIV-1 IN inhibitors. We created an IN inhibitor database by collecting experimental data from the literature. We developed quantitative structure-activity relationship (QSAR) models of HIV-1 IN strand transfer (ST) inhibitors using this database. The prediction accuracy of these models was estimated by external 5-fold cross-validation as well as with an additional validation set of 308 structurally distinct compounds from the publicly accessible BindingDB database. The validated models were used to screen a small combinatorial library of potential synthetic candidates to identify hits, with a subsequent docking approach applied to further filter out compounds to arrive at a small set of potential HIV-1 IN inhibitors. As result, 236 compounds with good druglikeness properties and with correct docking poses were identified as potential candidates for synthesis. One of the six compounds finally chosen for synthesis was experimentally confirmed to inhibit the ST reaction with an IC50(ST) of 37 µM. The IN inhibitor database is available for download from http://cactus.nci.nih.gov/download/iidb/.