Search | VHL Regional Portal

1.

TIN-X version 3: update with expanded dataset and modernized architecture for enhanced illumination of understudied targets.

Metzger, Vincent T; Cannon, Daniel C; Yang, Jeremy J; Mathias, Stephen L; Bologa, Cristian G; Waller, Anna; Schürer, Stephan C; Vidovic, Dusica; Kelleher, Keith J; Sheils, Timothy K; Jensen, Lars Juhl; Lambert, Christophe G; Oprea, Tudor I; Edwards, Jeremy S.

PeerJ ; 12: e17470, 2024.

Article in English | MEDLINE | ID: mdl-38948230

ABSTRACT

TIN-X (Target Importance and Novelty eXplorer) is an interactive visualization tool for illuminating associations between diseases and potential drug targets and is publicly available at newdrugtargets.org. TIN-X uses natural language processing to identify disease and protein mentions within PubMed content using previously published tools for named entity recognition (NER) of gene/protein and disease names. Target data is obtained from the Target Central Resource Database (TCRD). Two important metrics, novelty and importance, are computed from this data and when plotted as log(importance) vs. log(novelty), aid the user in visually exploring the novelty of drug targets and their associated importance to diseases. TIN-X Version 3.0 has been significantly improved with an expanded dataset, modernized architecture including a REST API, and an improved user interface (UI). The dataset has been expanded to include not only PubMed publication titles and abstracts, but also full-text articles when available. This results in approximately 9-fold more target/disease associations compared to previous versions of TIN-X. Additionally, the TIN-X database containing this expanded dataset is now hosted in the cloud via Amazon RDS. Recent enhancements to the UI focuses on making it more intuitive for users to find diseases or drug targets of interest while providing a new, sortable table-view mode to accompany the existing plot-view mode. UI improvements also help the user browse the associated PubMed publications to explore and understand the basis of TIN-X's predicted association between a specific disease and a target of interest. While implementing these upgrades, computational resources are balanced between the webserver and the user's web browser to achieve adequate performance while accommodating the expanded dataset. Together, these advances aim to extend the duration that users can benefit from TIN-X while providing both an expanded dataset and new features that researchers can use to better illuminate understudied proteins.

Subject(s)

User-Computer Interface , Humans , Natural Language Processing , PubMed , Software

2.

Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning.

Cappelletti, Luca; Rekerle, Lauren; Fontana, Tommaso; Hansen, Peter; Casiraghi, Elena; Ravanmehr, Vida; Mungall, Christopher J; Yang, Jeremy J; Spranger, Leonard; Karlebach, Guy; Caufield, J Harry; Carmody, Leigh; Coleman, Ben; Oprea, Tudor I; Reese, Justin; Valentini, Giorgio; Robinson, Peter N.

Bioinform Adv ; 4(1): vbae036, 2024.

Article in English | MEDLINE | ID: mdl-38577542

ABSTRACT

Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes. Results: We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement. Availability and implementation: Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.

3.

Novel drug targets in 2023.

Avram, Sorin; Halip, Liliana; Curpan, Ramona; Oprea, Tudor I.

Nat Rev Drug Discov ; 23(5): 330, 2024 05.

Article in English | MEDLINE | ID: mdl-38565953

Subject(s)

Drug Discovery , Humans , Drug Delivery Systems , Molecular Targeted Therapy , Drug Development/methods

4.

Overview of the Knowledge Management Center for Illuminating the Druggable Genome.

Oprea, Tudor I; Bologa, Cristian; Holmes, Jayme; Mathias, Stephen; Metzger, Vincent T; Waller, Anna; Yang, Jeremy J; Leach, Andrew R; Jensen, Lars Juhl; Kelleher, Keith J; Sheils, Timothy K; Mathé, Ewy; Avram, Sorin; Edwards, Jeremy S.

Drug Discov Today ; 29(3): 103882, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38218214

ABSTRACT

The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project aims to aggregate, update, and articulate protein-centric data knowledge for the entire human proteome, with emphasis on the understudied proteins from the three IDG protein families. KMC collates and analyzes data from over 70 resources to compile the Target Central Resource Database (TCRD), which is the web-based informatics platform (Pharos). These data include experimental, computational, and text-mined information on protein structures, compound interactions, and disease and phenotype associations. Based on this knowledge, proteins are classified into different Target Development Levels (TDLs) for identification of understudied targets. Additional work by the KMC focuses on enriching target knowledge and producing DrugCentral and other data visualization tools for expanding investigation of understudied targets.

Subject(s)

Genome , Knowledge Management , Humans , Proteome , Databases, Factual , Informatics

5.

Integration of virtual and physical screening.

Fara, Dan C; Oprea, Tudor I; Prossnitz, Eric R; Bologa, Cristian G; Edwards, Bruce S; Sklar, Larry A.

Drug Discov Today Technol ; 3(4): 377-385, 2006.

Article in English | MEDLINE | ID: mdl-38620118

ABSTRACT

High-throughput screening (HTS) represents the dominant technique for the identification of new lead compounds in current drug discovery. It consists of physical screening (PS) of large libraries of chemicals against one or more specific biological targets. Virtual screening (VS) is a strategy for in silico evaluation of chemical libraries for a given target, and can be integrated to focus the PS process. The present work addresses the integration of both PS and VS, respectively.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL