RESUMO
The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.
Assuntos
Bases de Dados Factuais , Terapia de Alvo Molecular , Proteoma , Humanos , Produtos Biológicos , Descoberta de Drogas , Internet , Proteoma/efeitos dos fármacosRESUMO
In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.
Assuntos
Bases de Dados Factuais , Genoma Humano , Doenças Neurodegenerativas/genética , Proteômica/métodos , Software , Viroses/genética , Animais , Anticonvulsivantes/química , Anticonvulsivantes/uso terapêutico , Antivirais/química , Antivirais/uso terapêutico , Produtos Biológicos/química , Produtos Biológicos/uso terapêutico , Mineração de Dados/estatística & dados numéricos , Interações Hospedeiro-Patógeno/efeitos dos fármacos , Interações Hospedeiro-Patógeno/genética , Humanos , Internet , Aprendizado de Máquina/estatística & dados numéricos , Camundongos , Camundongos Knockout , Terapia de Alvo Molecular/métodos , Doenças Neurodegenerativas/classificação , Doenças Neurodegenerativas/tratamento farmacológico , Doenças Neurodegenerativas/virologia , Mapeamento de Interação de Proteínas , Proteoma/agonistas , Proteoma/antagonistas & inibidores , Proteoma/genética , Proteoma/metabolismo , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/uso terapêutico , Viroses/classificação , Viroses/tratamento farmacológico , Viroses/virologiaRESUMO
MOTIVATION: Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS: Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION: Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Estudo de Associação Genômica Ampla , Iluminação , Genótipo , Polimorfismo de Nucleotídeo Único , FenótipoRESUMO
DrugCentral is a drug information resource (http://drugcentral.org) open to the public since 2016 and previously described in the 2017 Nucleic Acids Research Database issue. Since the 2016 release, 103 new approved drugs were updated. The following new data sources have been included: Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS), FDA Orange Book information, L1000 gene perturbation profile distance/similarity matrices and estimated protonation constants. New and existing entries have been updated with the latest information from scientific literature, drug labels and external databases. The web interface has been updated to display and query new data. The full database dump and data files are available for download from the DrugCentral website.
Assuntos
Bases de Dados de Produtos Farmacêuticos , Aprovação de Drogas/estatística & dados numéricos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Expressão Gênica/efeitos dos fármacos , Preparações Farmacêuticas/classificação , Proteínas/classificaçãoRESUMO
DrugCentral (http://drugcentral.org) is an open-access online drug compendium. DrugCentral integrates structure, bioactivity, regulatory, pharmacologic actions and indications for active pharmaceutical ingredients approved by FDA and other regulatory agencies. Monitoring of regulatory agencies for new drugs approvals ensures the resource is up-to-date. DrugCentral integrates content for active ingredients with pharmaceutical formulations, indexing drugs and drug label annotations, complementing similar resources available online. Its complementarity with other online resources is facilitated by cross referencing to external resources. At the molecular level, DrugCentral bridges drug-target interactions with pharmacological action and indications. The integration with FDA drug labels enables text mining applications for drug adverse events and clinical trial information. Chemical structure overlap between DrugCentral and five online drug resources, and the overlap between DrugCentral FDA-approved drugs and their presence in four different chemical collections, are discussed. DrugCentral can be accessed via the web application or downloaded in relational database format.
Assuntos
Bases de Dados de Produtos Farmacêuticos , Ferramenta de Busca , Navegador , Aprovação de Drogas , Composição de Medicamentos , Interações Medicamentosas , Rotulagem de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Preparações Farmacêuticas/química , Estados Unidos , United States Food and Drug AdministrationRESUMO
MOTIVATION: The increasing amount of peer-reviewed manuscripts requires the development of specific mining tools to facilitate the visual exploration of evidence linking diseases and proteins. RESULTS: We developed TIN-X, the Target Importance and Novelty eXplorer, to visualize the association between proteins and diseases, based on text mining data processed from scientific literature. In the current implementation, TIN-X supports exploration of data for G-protein coupled receptors, kinases, ion channels, and nuclear receptors. TIN-X supports browsing and navigating across proteins and diseases based on ontology classes, and displays a scatter plot with two proposed new bibliometric statistics: Importance and Novelty. AVAILABILITY AND IMPLEMENTATION: http://www.newdrugtargets.org. CONTACT: cbologa@salud.unm.edu.
Assuntos
Mineração de Dados/métodos , Doença/etiologia , Software , Ontologias Biológicas , Gráficos por Computador , Humanos , Canais Iônicos/metabolismo , Fosfotransferases/metabolismo , Receptores Citoplasmáticos e Nucleares/metabolismo , Receptores Acoplados a Proteínas G/metabolismoRESUMO
TIN-X (Target Importance and Novelty eXplorer) is an interactive visualization tool for illuminating associations between diseases and potential drug targets and is publicly available at newdrugtargets.org. TIN-X uses natural language processing to identify disease and protein mentions within PubMed content using previously published tools for named entity recognition (NER) of gene/protein and disease names. Target data is obtained from the Target Central Resource Database (TCRD). Two important metrics, novelty and importance, are computed from this data and when plotted as log(importance) vs. log(novelty), aid the user in visually exploring the novelty of drug targets and their associated importance to diseases. TIN-X Version 3.0 has been significantly improved with an expanded dataset, modernized architecture including a REST API, and an improved user interface (UI). The dataset has been expanded to include not only PubMed publication titles and abstracts, but also full-text articles when available. This results in approximately 9-fold more target/disease associations compared to previous versions of TIN-X. Additionally, the TIN-X database containing this expanded dataset is now hosted in the cloud via Amazon RDS. Recent enhancements to the UI focuses on making it more intuitive for users to find diseases or drug targets of interest while providing a new, sortable table-view mode to accompany the existing plot-view mode. UI improvements also help the user browse the associated PubMed publications to explore and understand the basis of TIN-X's predicted association between a specific disease and a target of interest. While implementing these upgrades, computational resources are balanced between the webserver and the user's web browser to achieve adequate performance while accommodating the expanded dataset. Together, these advances aim to extend the duration that users can benefit from TIN-X while providing both an expanded dataset and new features that researchers can use to better illuminate understudied proteins.
Assuntos
Interface Usuário-Computador , Humanos , Processamento de Linguagem Natural , PubMed , SoftwareRESUMO
Pharos is an integrated web-based informatics platform for the analysis of data aggregated by the Illuminating the Druggable Genome (IDG) Knowledge Management Center, an NIH Common Fund initiative. The current version of Pharos (as of October 2019) spans 20,244 proteins in the human proteome, 19,880 disease and phenotype associations, and 226,829 ChEMBL compounds. This resource not only collates and analyzes data from over 60 high-quality resources to generate these types, but also uses text indexing to find less apparent connections between targets, and has recently begun to collaborate with institutions that generate data and resources. Proteins are ranked according to a knowledge-based classification system, which can help researchers to identify less studied "dark" targets that could be potentially further illuminated. This is an important process for both drug discovery and target validation, as more knowledge can accelerate target identification, and previously understudied proteins can serve as novel targets in drug discovery. Two basic protocols illustrate the levels of detail available for targets and several methods of finding targets of interest. An Alternate Protocol illustrates the difference in available knowledge between less and more studied targets. © 2020 by John Wiley & Sons, Inc. Basic Protocol 1: Search for a target and view details Alternate Protocol: Search for dark target and view details Basic Protocol 2: Filter a target list to get refined results.
Assuntos
Descoberta de Drogas , Genoma , Software , Neoplasias da Mama/genética , Sistemas de Liberação de Medicamentos , Feminino , Estudo de Associação Genômica Ampla , Humanos , Ligantes , Receptores Acoplados a Proteínas G/metabolismoRESUMO
A large proportion of biomedical research and the development of therapeutics is focused on a small fraction of the human genome. In a strategic effort to map the knowledge gaps around proteins encoded by the human genome and to promote the exploration of currently understudied, but potentially druggable, proteins, the US National Institutes of Health launched the Illuminating the Druggable Genome (IDG) initiative in 2014. In this article, we discuss how the systematic collection and processing of a wide array of genomic, proteomic, chemical and disease-related resource data by the IDG Knowledge Management Center have enabled the development of evidence-based criteria for tracking the target development level (TDL) of human proteins, which indicates a substantial knowledge deficit for approximately one out of three proteins in the human proteome. We then present spotlights on the TDL categories as well as key drug target classes, including G protein-coupled receptors, protein kinases and ion channels, which illustrate the nature of the unexplored opportunities for biomedical research and therapeutic development.
RESUMO
Therapeutic intent, the reason behind the choice of a therapy and the context in which a given approach should be used, is an important aspect of medical practice. There are unmet needs with respect to current electronic mapping of drug indications. For example, the active ingredient sildenafil has 2 distinct indications, which differ solely on dosage strength. In progressing toward a practice of precision medicine, there is a need to capture and structure therapeutic intent for computational reuse, thus enabling more sophisticated decision-support tools and a possible mechanism for computer-aided drug repurposing. The indications for drugs, such as those expressed in the Structured Product Labels approved by the US Food and Drug Administration, appears to be a tractable area for developing an application ontology of therapeutic intent.
Assuntos
Rotulagem de Medicamentos , Tratamento Farmacológico , Vocabulário Controlado , Reposicionamento de Medicamentos , Humanos , Medicina de Precisão , Estados Unidos , United States Food and Drug AdministrationRESUMO
BACKGROUND: One of the most successful approaches to develop new small molecule therapeutics has been to start from a validated druggable protein target. However, only a small subset of potentially druggable targets has attracted significant research and development resources. The Illuminating the Druggable Genome (IDG) project develops resources to catalyze the development of likely targetable, yet currently understudied prospective drug targets. A central component of the IDG program is a comprehensive knowledge resource of the druggable genome. RESULTS: As part of that effort, we have developed a framework to integrate, navigate, and analyze drug discovery data based on formalized and standardized classifications and annotations of druggable protein targets, the Drug Target Ontology (DTO). DTO was constructed by extensive curation and consolidation of various resources. DTO classifies the four major drug target protein families, GPCRs, kinases, ion channels and nuclear receptors, based on phylogenecity, function, target development level, disease association, tissue expression, chemical ligand and substrate characteristics, and target-family specific characteristics. The formal ontology was built using a new software tool to auto-generate most axioms from a database while supporting manual knowledge acquisition. A modular, hierarchical implementation facilitate ontology development and maintenance and makes use of various external ontologies, thus integrating the DTO into the ecosystem of biomedical ontologies. As a formal OWL-DL ontology, DTO contains asserted and inferred axioms. Modeling data from the Library of Integrated Network-based Cellular Signatures (LINCS) program illustrates the potential of DTO for contextual data integration and nuanced definition of important drug target characteristics. DTO has been implemented in the IDG user interface Portal, Pharos and the TIN-X explorer of protein target disease relationships. CONCLUSIONS: DTO was built based on the need for a formal semantic model for druggable targets including various related information such as protein, gene, protein domain, protein structure, binding site, small molecule drug, mechanism of action, protein tissue localization, disease association, and many other types of information. DTO will further facilitate the otherwise challenging integration and formal linking to biological assays, phenotypes, disease models, drug poly-pharmacology, binding kinetics and many other processes, functions and qualities that are at the core of drug discovery. The first version of DTO is publically available via the website http://drugtargetontology.org/ , Github ( http://github.com/DrugTargetOntology/DTO ), and the NCBO Bioportal ( http://bioportal.bioontology.org/ontologies/DTO ). The long-term goal of DTO is to provide such an integrative framework and to populate the ontology with this information as a community resource.
Assuntos
Ontologias Biológicas , Biologia Computacional/métodos , Sistemas de Liberação de Medicamentos/métodos , Descoberta de Drogas/métodos , Humanos , Proteínas/classificação , Proteínas/genética , Proteínas/metabolismo , Semântica , SoftwareRESUMO
Many bioactivity databases offer information regarding the biological activity of small molecules on protein targets. Information in these databases is often hard to resolve with certainty because of subsetting different data in a variety of formats; use of different bioactivity metrics; use of different identifiers for chemicals and proteins; and having to access different query interfaces, respectively. Given the multitude of data sources, interfaces and standards, it is challenging to gather relevant facts and make appropriate connections and decisions regarding chemical-protein associations. The CARLSBAD database has been developed as an integrated resource, focused on high-quality subsets from several bioactivity databases, which are aggregated and presented in a uniform manner, suitable for the study of the relationships between small molecules and targets. In contrast to data collection resources, CARLSBAD provides a single normalized activity value of a given type for each unique chemical-protein target pair. Two types of scaffold perception methods have been implemented and are available for datamining: HierS (hierarchical scaffolds) and MCES (maximum common edge subgraph). The 2012 release of CARLSBAD contains 439 985 unique chemical structures, mapped onto 1,420 889 unique bioactivities, and annotated with 277 140 HierS scaffolds and 54 135 MCES chemical patterns, respectively. Of the 890 323 unique structure-target pairs curated in CARLSBAD, 13.95% are aggregated from multiple structure-target values: 94 975 are aggregated from two bioactivities, 14 544 from three, 7 930 from four and 2214 have five bioactivities, respectively. CARLSBAD captures bioactivities and tags for 1435 unique chemical structures of active pharmaceutical ingredients (i.e. 'drugs'). CARLSBAD processing resulted in a net 17.3% data reduction for chemicals, 34.3% reduction for bioactivities, 23% reduction for HierS and 25% reduction for MCES, respectively. The CARLSBAD database supports a knowledge mining system that provides non-specialists with novel integrative ways of exploring chemical biology space to facilitate knowledge mining in drug discovery and repurposing. Database URL: http://carlsbad.health.unm.edu/carlsbad/.
Assuntos
Bases de Dados de Compostos Químicos , Mineração de Dados , Internet , Proteínas/metabolismo , Interface Usuário-ComputadorRESUMO
Finding new uses for old drugs is a strategy embraced by the pharmaceutical industry, with increasing participation from the academic sector. Drug repurposing efforts focus on identifying novel modes of action, but not in a systematic manner. With intensive data mining and curation, we aim to apply bio- and cheminformatics tools using the DRUGS database, containing 3,837 unique small molecules annotated on 1,750 proteins. These are likely to serve as drug targets and antitargets (i.e., associated with side effects, SE). The academic community, the pharmaceutical sector and clinicians alike could benefit from an integrated, semantic-web compliant computer-aided drug repurposing (CADR) effort, one that would enable deep data mining of associations between approved drugs (D), targets (T), clinical outcomes (CO) and SE. We report preliminary results from text mining and multivariate statistics, based on 7,684 approved drug labels, ADL (Dailymed) via text mining. From the ADL corresponding to 988 unique drugs, the "adverse reactions" section was mapped onto 174 SE, then clustered via principal component analysis into a 5x5 self-organizing map that was integrated into a Cytoscape network of SE-D-T-CO. This type of data can be used to streamline drug repurposing and may result in novel insights that can lead to the identification of novel drug actions.