RESUMEN
DrugCentral monitors new drug approvals and standardizes drug information. The current update contains 285 drugs (131 for human use). New additions include: (i) the integration of veterinary drugs (154 for animal use only), (ii) the addition of 66 documented off-label uses and iii) the identification of adverse drug events from pharmacovigilance data for pediatric and geriatric patients. Additional enhancements include chemical substructure searching using SMILES and 'Target Cards' based on UniProt accession codes. Statistics of interests include the following: (i) 60% of the covered drugs are on-market drugs with expired patent and exclusivity coverage, 17% are off-market, and 23% are on-market drugs with active patents and exclusivity coverage; (ii) 59% of the drugs are oral, 33% are parenteral and 18% topical, at the level of the active ingredients; (iii) only 3% of all drugs are for animal use only; however, 61% of the veterinary drugs are also approved for human use; (iv) dogs, cats and horses are by far the most represented target species for veterinary drugs; (v) the physicochemical property profile of animal drugs is very similar to that of human drugs. Use cases include azaperone, the only sedative approved for swine, and ruxolitinib, a Janus kinase inhibitor.
Asunto(s)
Aprobación de Drogas , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Drogas Veterinarias , Animales , Humanos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/veterinaria , Drogas Veterinarias/administración & dosificación , Drogas Veterinarias/efectos adversos , Uso Fuera de lo Indicado/veterinariaRESUMEN
The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.
Asunto(s)
Bases de Datos Factuales , Terapia Molecular Dirigida , Proteoma , Humanos , Productos Biológicos , Descubrimiento de Drogas , Internet , Proteoma/efectos de los fármacosRESUMEN
DrugCentral is a public resource (http://drugcentral.org) that serves the scientific community by providing up-to-date drug information, as described in previous papers. The current release includes 109 newly approved (October 2018 through March 2020) active pharmaceutical ingredients in the US, Europe, Japan and other countries; and two molecular entities (e.g. mefuparib) of interest for COVID19. New additions include a set of pharmacokinetic properties for â¼1000 drugs, and a sex-based separation of side effects, processed from FAERS (FDA Adverse Event Reporting System); as well as a drug repositioning prioritization scheme based on the market availability and intellectual property rights forFDA approved drugs. In the context of the COVID19 pandemic, we also incorporated REDIAL-2020, a machine learning platform that estimates anti-SARS-CoV-2 activities, as well as the 'drugs in news' feature offers a brief enumeration of the most interesting drugs at the present moment. The full database dump and data files are available for download from the DrugCentral web portal.
Asunto(s)
Antivirales/uso terapéutico , Tratamiento Farmacológico de COVID-19 , Bases de Datos Farmacéuticas/estadística & datos numéricos , Aprobación de Drogas/estadística & datos numéricos , Descubrimiento de Drogas/estadística & datos numéricos , Reposicionamiento de Medicamentos/estadística & datos numéricos , SARS-CoV-2/efectos de los fármacos , Antivirales/efectos adversos , Antivirales/farmacocinética , COVID-19/epidemiología , COVID-19/virología , Aprobación de Drogas/métodos , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Epidemias , Europa (Continente) , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Japón , SARS-CoV-2/fisiología , Estados UnidosRESUMEN
In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.
Asunto(s)
Bases de Datos Factuales , Genoma Humano , Enfermedades Neurodegenerativas/genética , Proteómica/métodos , Programas Informáticos , Virosis/genética , Animales , Anticonvulsivantes/química , Anticonvulsivantes/uso terapéutico , Antivirales/química , Antivirales/uso terapéutico , Productos Biológicos/química , Productos Biológicos/uso terapéutico , Minería de Datos/estadística & datos numéricos , Interacciones Huésped-Patógeno/efectos de los fármacos , Interacciones Huésped-Patógeno/genética , Humanos , Internet , Aprendizaje Automático/estadística & datos numéricos , Ratones , Ratones Noqueados , Terapia Molecular Dirigida/métodos , Enfermedades Neurodegenerativas/clasificación , Enfermedades Neurodegenerativas/tratamiento farmacológico , Enfermedades Neurodegenerativas/virología , Mapeo de Interacción de Proteínas , Proteoma/agonistas , Proteoma/antagonistas & inhibidores , Proteoma/genética , Proteoma/metabolismo , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/uso terapéutico , Virosis/clasificación , Virosis/tratamiento farmacológico , Virosis/virologíaRESUMEN
BACKGROUND: LINCS, "Library of Integrated Network-based Cellular Signatures", and IDG, "Illuminating the Druggable Genome", are both NIH projects and consortia that have generated rich datasets for the study of the molecular basis of human health and disease. LINCS L1000 expression signatures provide unbiased systems/omics experimental evidence. IDG provides compiled and curated knowledge for illumination and prioritization of novel drug target hypotheses. Together, these resources can support a powerful new approach to identifying novel drug targets for complex diseases, such as Parkinson's disease (PD), which continues to inflict severe harm on human health, and resist traditional research approaches. RESULTS: Integrating LINCS and IDG, we built the Knowledge Graph Analytics Platform (KGAP) to support an important use case: identification and prioritization of drug target hypotheses for associated diseases. The KGAP approach includes strong semantics interpretable by domain scientists and a robust, high performance implementation of a graph database and related analytical methods. Illustrating the value of our approach, we investigated results from queries relevant to PD. Approved PD drug indications from IDG's resource DrugCentral were used as starting points for evidence paths exploring chemogenomic space via LINCS expression signatures for associated genes, evaluated as target hypotheses by integration with IDG. The KG-analytic scoring function was validated against a gold standard dataset of genes associated with PD as elucidated, published mechanism-of-action drug targets, also from DrugCentral. IDG's resource TIN-X was used to rank and filter KGAP results for novel PD targets, and one, SYNGR3 (Synaptogyrin-3), was manually investigated further as a case study and plausible new drug target for PD. CONCLUSIONS: The synergy of LINCS and IDG, via KG methods, empowers graph analytics methods for the investigation of the molecular basis of complex diseases, and specifically for identification and prioritization of novel drug targets. The KGAP approach enables downstream applications via integration with resources similarly aligned with modern KG methodology. The generality of the approach indicates that KGAP is applicable to many disease areas, in addition to PD, the focus of this paper.
Asunto(s)
Enfermedad de Parkinson , Biblioteca de Genes , Genoma , Humanos , Iluminación , Enfermedad de Parkinson/tratamiento farmacológico , Enfermedad de Parkinson/genética , Reconocimiento de Normas Patrones AutomatizadasRESUMEN
MOTIVATION: Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS: Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION: Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Estudio de Asociación del Genoma Completo , Iluminación , Genotipo , Polimorfismo de Nucleótido Simple , FenotipoRESUMEN
DrugCentral is a drug information resource (http://drugcentral.org) open to the public since 2016 and previously described in the 2017 Nucleic Acids Research Database issue. Since the 2016 release, 103 new approved drugs were updated. The following new data sources have been included: Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS), FDA Orange Book information, L1000 gene perturbation profile distance/similarity matrices and estimated protonation constants. New and existing entries have been updated with the latest information from scientific literature, drug labels and external databases. The web interface has been updated to display and query new data. The full database dump and data files are available for download from the DrugCentral website.
Asunto(s)
Bases de Datos Farmacéuticas , Aprobación de Drogas/estadística & datos numéricos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Expresión Génica/efectos de los fármacos , Preparaciones Farmacéuticas/clasificación , Proteínas/clasificaciónRESUMEN
DrugCentral (http://drugcentral.org) is an open-access online drug compendium. DrugCentral integrates structure, bioactivity, regulatory, pharmacologic actions and indications for active pharmaceutical ingredients approved by FDA and other regulatory agencies. Monitoring of regulatory agencies for new drugs approvals ensures the resource is up-to-date. DrugCentral integrates content for active ingredients with pharmaceutical formulations, indexing drugs and drug label annotations, complementing similar resources available online. Its complementarity with other online resources is facilitated by cross referencing to external resources. At the molecular level, DrugCentral bridges drug-target interactions with pharmacological action and indications. The integration with FDA drug labels enables text mining applications for drug adverse events and clinical trial information. Chemical structure overlap between DrugCentral and five online drug resources, and the overlap between DrugCentral FDA-approved drugs and their presence in four different chemical collections, are discussed. DrugCentral can be accessed via the web application or downloaded in relational database format.
Asunto(s)
Bases de Datos Farmacéuticas , Motor de Búsqueda , Navegador Web , Aprobación de Drogas , Composición de Medicamentos , Interacciones Farmacológicas , Etiquetado de Medicamentos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Preparaciones Farmacéuticas/química , Estados Unidos , United States Food and Drug AdministrationRESUMEN
MOTIVATION: The increasing amount of peer-reviewed manuscripts requires the development of specific mining tools to facilitate the visual exploration of evidence linking diseases and proteins. RESULTS: We developed TIN-X, the Target Importance and Novelty eXplorer, to visualize the association between proteins and diseases, based on text mining data processed from scientific literature. In the current implementation, TIN-X supports exploration of data for G-protein coupled receptors, kinases, ion channels, and nuclear receptors. TIN-X supports browsing and navigating across proteins and diseases based on ontology classes, and displays a scatter plot with two proposed new bibliometric statistics: Importance and Novelty. AVAILABILITY AND IMPLEMENTATION: http://www.newdrugtargets.org. CONTACT: cbologa@salud.unm.edu.
Asunto(s)
Minería de Datos/métodos , Enfermedad/etiología , Programas Informáticos , Ontologías Biológicas , Gráficos por Computador , Humanos , Canales Iónicos/metabolismo , Fosfotransferasas/metabolismo , Receptores Citoplasmáticos y Nucleares/metabolismo , Receptores Acoplados a Proteínas G/metabolismoRESUMEN
The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project aims to aggregate, update, and articulate protein-centric data knowledge for the entire human proteome, with emphasis on the understudied proteins from the three IDG protein families. KMC collates and analyzes data from over 70 resources to compile the Target Central Resource Database (TCRD), which is the web-based informatics platform (Pharos). These data include experimental, computational, and text-mined information on protein structures, compound interactions, and disease and phenotype associations. Based on this knowledge, proteins are classified into different Target Development Levels (TDLs) for identification of understudied targets. Additional work by the KMC focuses on enriching target knowledge and producing DrugCentral and other data visualization tools for expanding investigation of understudied targets.
Asunto(s)
Genoma , Gestión del Conocimiento , Humanos , Proteoma , Bases de Datos Factuales , InformáticaRESUMEN
Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes. Results: We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement. Availability and implementation: Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.
RESUMEN
Tuberculosis (TB) is still a major global health challenge, killing over 1.5 million people each year, and hence, there is a need to identify and develop novel treatments for Mycobacterium tuberculosis (M. tuberculosis). The prevalence of infections caused by nontuberculous mycobacteria (NTM) is also increasing and has overtaken TB cases in the United States and much of the developed world. Mycobacterium abscessus (M. abscessus) is one of the most frequently encountered NTM and is difficult to treat. We describe the use of drug-disease association using a semantic knowledge graph approach combined with machine learning models that has enabled the identification of several molecules for testing anti-mycobacterial activity. We established that niclosamide (M. tuberculosis IC90 2.95 µM; M. abscessus IC90 59.1 µM) and tribromsalan (M. tuberculosis IC90 76.92 µM; M. abscessus IC90 147.4 µM) inhibit M. tuberculosis and M. abscessus in vitro. To investigate the mode of action, we determined the transcriptional response of M. tuberculosis and M. abscessus to both compounds in axenic log phase, demonstrating a broad effect on gene expression that differed from known M. tuberculosis inhibitors. Both compounds elicited transcriptional responses indicative of respiratory pathway stress and the dysregulation of fatty acid metabolism.
Asunto(s)
Infecciones por Mycobacterium no Tuberculosas , Mycobacterium abscessus , Mycobacterium tuberculosis , Salicilanilidas , Tuberculosis , Humanos , Mycobacterium tuberculosis/genética , Infecciones por Mycobacterium no Tuberculosas/microbiología , Niclosamida/farmacología , Reposicionamiento de Medicamentos , Micobacterias no Tuberculosas/genética , Tuberculosis/tratamiento farmacológico , Tuberculosis/microbiologíaRESUMEN
TIN-X (Target Importance and Novelty eXplorer) is an interactive visualization tool for illuminating associations between diseases and potential drug targets and is publicly available at newdrugtargets.org. TIN-X uses natural language processing to identify disease and protein mentions within PubMed content using previously published tools for named entity recognition (NER) of gene/protein and disease names. Target data is obtained from the Target Central Resource Database (TCRD). Two important metrics, novelty and importance, are computed from this data and when plotted as log(importance) vs. log(novelty), aid the user in visually exploring the novelty of drug targets and their associated importance to diseases. TIN-X Version 3.0 has been significantly improved with an expanded dataset, modernized architecture including a REST API, and an improved user interface (UI). The dataset has been expanded to include not only PubMed publication titles and abstracts, but also full-text articles when available. This results in approximately 9-fold more target/disease associations compared to previous versions of TIN-X. Additionally, the TIN-X database containing this expanded dataset is now hosted in the cloud via Amazon RDS. Recent enhancements to the UI focuses on making it more intuitive for users to find diseases or drug targets of interest while providing a new, sortable table-view mode to accompany the existing plot-view mode. UI improvements also help the user browse the associated PubMed publications to explore and understand the basis of TIN-X's predicted association between a specific disease and a target of interest. While implementing these upgrades, computational resources are balanced between the webserver and the user's web browser to achieve adequate performance while accommodating the expanded dataset. Together, these advances aim to extend the duration that users can benefit from TIN-X while providing both an expanded dataset and new features that researchers can use to better illuminate understudied proteins.
Asunto(s)
Interfaz Usuario-Computador , Humanos , Procesamiento de Lenguaje Natural , PubMed , Programas InformáticosRESUMEN
BACKGROUND: Birth defects are functional and structural abnormalities that impact about 1 in 33 births in the United States. They have been attributed to genetic and other factors such as drugs, cosmetics, food, and environmental pollutants during pregnancy, but for most birth defects there are no known causes. METHODS: To further characterize associations between small molecule compounds and their potential to induce specific birth abnormalities, we gathered knowledge from multiple sources to construct a reproductive toxicity Knowledge Graph (ReproTox-KG) with a focus on associations between birth defects, drugs, and genes. Specifically, we gathered data from drug/birth-defect associations from co-mentions in published abstracts, gene/birth-defect associations from genetic studies, drug- and preclinical-compound-induced gene expression changes in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecules. RESULTS: Using ReproTox-KG and semi-supervised learning (SSL), we scored >30,000 preclinical small molecules for their potential to cross the placenta and induce birth defects, and identified >500 birth-defect/gene/drug cliques that can be used to explain molecular mechanisms for drug-induced birth defects. The ReproTox-KG can be accessed via a web-based user interface available at https://maayanlab.cloud/reprotox-kg . This site enables users to explore the associations between birth defects, approved and preclinical drugs, and all human genes. CONCLUSIONS: ReproTox-KG provides a resource for exploring knowledge about the molecular mechanisms of birth defects with the potential of predicting the likelihood of genes and preclinical small molecules to induce birth defects.
While birth defects are common, for most birth defects there are no known causes. During pregnancy, developing babies are exposed to drugs, cosmetics, food, and environmental pollutants that may cause birth defects. However, exactly how these environmental factors are involved in producing birth defects is difficult to discern. Also, birth defects can be a consequence of the genes inherited from the parents. We combined general data about human genes and drugs with specific data previously implicating genes and drugs in inducing birth defects to create a knowledge graph representation that connects genes, drugs, and birth defects. This knowledge graph can be used to explore new links that may explain why birth defects occur, particularly those that result from a combination of inherited and environmental influences.
RESUMEN
The Illuminating the Druggable Genome (IDG) consortium is a National Institutes of Health (NIH) Common Fund program designed to enhance our knowledge of under-studied proteins, more specifically, proteins unannotated within the three most commonly drug-targeted protein families: G-protein coupled receptors, ion channels, and protein kinases. Since 2014, the IDG Knowledge Management Center (IDG-KMC) has generated several open-access datasets and resources that jointly serve as a highly translational machine-learning-ready knowledgebase focused on human protein-coding genes and their products. The goal of the IDG-KMC is to develop comprehensive integrated knowledge for the druggable genome to illuminate the uncharacterized or poorly annotated portion of the druggable genome. The tools derived from the IDG-KMC provide either user-friendly visualizations or ways to impute the knowledge about potential targets using machine learning strategies. In the following protocols, we describe how to use each web-based tool to accelerate illumination in under-studied proteins. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Interacting with the Pharos user interface Basic Protocol 2: Accessing the data in Harmonizome Basic Protocol 3: The ARCHS4 resource Basic Protocol 4: Making predictions about gene function with PrismExp Basic Protocol 5: Using Geneshot to illuminate knowledge about under-studied targets Basic Protocol 6: Exploring under-studied targets with TIN-X Basic Protocol 7: Interacting with the DrugCentral user interface Basic Protocol 8: Estimating Anti-SARS-CoV-2 activities with DrugCentral REDIAL-2020 Basic Protocol 9: Drug Set Enrichment Analysis using Drugmonizome Basic Protocol 10: The Drugmonizome-ML Appyter Basic Protocol 11: The Harmonizome-ML Appyter Basic Protocol 12: GWAS target illumination with TIGA Basic Protocol 13: Prioritizing kinases for lists of proteins and phosphoproteins with KEA3 Basic Protocol 14: Converting PubMed searches to drug sets with the DrugShot Appyter.
Asunto(s)
Bases de Datos Genéticas , Genoma , COVID-19 , Humanos , Aprendizaje Automático , Proteínas , SARS-CoV-2RESUMEN
With increased research funding for Alzheimer's disease (AD) and related disorders across the globe, large amounts of data are being generated. Several studies employed machine learning methods to understand the ever-growing omics data to enhance early diagnosis, map complex disease networks, or uncover potential drug targets. We describe results based on a Target Central Resource Database protein knowledge graph and evidence paths transformed into vectors by metapath matching. We extracted features between specific genes and diseases, then trained and optimized our model using XGBoost, termed MPxgb(AD). To determine our MPxgb(AD) prediction performance, we examined the top twenty predicted genes through an experimental screening pipeline. Our analysis identified potential AD risk genes: FRRS1, CTRAM, SCGB3A1, FAM92B/CIBAR2, and TMEFF2. FRRS1 and FAM92B are considered dark genes, while CTRAM, SCGB3A1, and TMEFF2 are connected to TREM2-TYROBP, IL-1ß-TNFα, and MTOR-APP AD-risk nodes, suggesting relevance to the pathogenesis of AD.
Asunto(s)
Enfermedad de Alzheimer , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Diagnóstico Precoz , Humanos , Aprendizaje Automático , Proteínas de la Membrana/metabolismo , Proteínas de NeoplasiasRESUMEN
COVID-19 cases have surpassed the 109 + million markers, with deaths tallying up to 2.4 million. Tens of thousands of papers regarding COVID-19 have been published along with countless bibliometric analyses done on COVID-19 literature. Despite this, none of the analyses have focused on domain entities occurring in scientific publications. However, analysis of these bio-entities and the relations among them, a strategy called entity metrics, could offer more insights into knowledge usage and diffusion in specific cases. Thus, this paper presents an entitymetric analysis on COVID-19 literature. We construct an entity-entity co-occurrence network and employ network indicators to analyze the extracted entities. We find that ACE-2 and C-reactive protein are two very important genes and that lopinavir and ritonavir are two very important chemicals, regardless of the results from either ranking.
RESUMEN
Strategies for drug discovery and repositioning are an urgent need with respect to COVID-19. We developed "REDIAL-2020", a suite of machine learning models for estimating small molecule activity from molecular structure, for a range of SARS-CoV-2 related assays. Each classifier is based on three distinct types of descriptors (fingerprint, physicochemical, and pharmacophore) for parallel model development. These models were trained using high throughput screening data from the NCATS COVID19 portal (https://opendata.ncats.nih.gov/covid19/index.html), with multiple categorical machine learning algorithms. The "best models" are combined in an ensemble consensus predictor that outperforms single models where external validation is available. This suite of machine learning models is available through the DrugCentral web portal (http://drugcentral.org/Redial). Acceptable input formats are: drug name, PubChem CID, or SMILES; the output is an estimate of anti-SARS-CoV-2 activities. The web application reports estimated activity across three areas (viral entry, viral replication, and live virus infectivity) spanning six independent models, followed by a similarity search that displays the most similar molecules to the query among experimentally determined data. The ML models have 60% to 74% external predictivity, based on three separate datasets. Complementing the NCATS COVID19 portal, REDIAL-2020 can serve as a rapid online tool for identifying active molecules for COVID-19 treatment. The source code and specific models are available through Github (https://github.com/sirimullalab/redial-2020), or via Docker Hub (https://hub.docker.com/r/sirimullalab/redial-2020) for users preferring a containerized version.
RESUMEN
Pharos is an integrated web-based informatics platform for the analysis of data aggregated by the Illuminating the Druggable Genome (IDG) Knowledge Management Center, an NIH Common Fund initiative. The current version of Pharos (as of October 2019) spans 20,244 proteins in the human proteome, 19,880 disease and phenotype associations, and 226,829 ChEMBL compounds. This resource not only collates and analyzes data from over 60 high-quality resources to generate these types, but also uses text indexing to find less apparent connections between targets, and has recently begun to collaborate with institutions that generate data and resources. Proteins are ranked according to a knowledge-based classification system, which can help researchers to identify less studied "dark" targets that could be potentially further illuminated. This is an important process for both drug discovery and target validation, as more knowledge can accelerate target identification, and previously understudied proteins can serve as novel targets in drug discovery. Two basic protocols illustrate the levels of detail available for targets and several methods of finding targets of interest. An Alternate Protocol illustrates the difference in available knowledge between less and more studied targets. © 2020 by John Wiley & Sons, Inc. Basic Protocol 1: Search for a target and view details Alternate Protocol: Search for dark target and view details Basic Protocol 2: Filter a target list to get refined results.
Asunto(s)
Descubrimiento de Drogas , Genoma , Programas Informáticos , Neoplasias de la Mama/genética , Sistemas de Liberación de Medicamentos , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Ligandos , Receptores Acoplados a Proteínas G/metabolismoRESUMEN
A large proportion of biomedical research and the development of therapeutics is focused on a small fraction of the human genome. In a strategic effort to map the knowledge gaps around proteins encoded by the human genome and to promote the exploration of currently understudied, but potentially druggable, proteins, the US National Institutes of Health launched the Illuminating the Druggable Genome (IDG) initiative in 2014. In this article, we discuss how the systematic collection and processing of a wide array of genomic, proteomic, chemical and disease-related resource data by the IDG Knowledge Management Center have enabled the development of evidence-based criteria for tracking the target development level (TDL) of human proteins, which indicates a substantial knowledge deficit for approximately one out of three proteins in the human proteome. We then present spotlights on the TDL categories as well as key drug target classes, including G protein-coupled receptors, protein kinases and ion channels, which illustrate the nature of the unexplored opportunities for biomedical research and therapeutic development.