Search | VHL Regional Portal

OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features.

Thafar, Maha A; Albaradei, Somayah; Uludag, Mahmut; Alshahrani, Mona; Gojobori, Takashi; Essack, Magbubah; Gao, Xin.

Front Genet ; 14: 1139626, 2023.

Article in English | MEDLINE | ID: mdl-37091791

ABSTRACT

Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein's amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the "OncologyTT" datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins' amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.

Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications.

Alshahrani, Mona; Almansour, Abdullah; Alkhaldi, Asma; Thafar, Maha A; Uludag, Mahmut; Essack, Magbubah; Hoehndorf, Robert.

PeerJ ; 10: e13061, 2022.

Article in English | MEDLINE | ID: mdl-35402106

ABSTRACT

Biomedical knowledge is represented in structured databases and published in biomedical literature, and different computational approaches have been developed to exploit each type of information in predictive models. However, the information in structured databases and literature is often complementary. We developed a machine learning method that combines information from literature and databases to predict drug targets and indications. To effectively utilize information in published literature, we integrate knowledge graphs and published literature using named entity recognition and normalization before applying a machine learning model that utilizes the combination of graph and literature. We then use supervised machine learning to show the effects of combining features from biomedical knowledge and published literature on the prediction of drug targets and drug indications. We demonstrate that our approach using datasets for drug-target interactions and drug indications is scalable to large graphs and can be used to improve the ranking of targets and indications by exploiting features from either structure or unstructured information alone.

Subject(s)

Machine Learning , Pattern Recognition, Automated , Drug Interactions , Supervised Machine Learning , Databases, Factual

Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning.

Thafar, Maha A; Alshahrani, Mona; Albaradei, Somayah; Gojobori, Takashi; Essack, Magbubah; Gao, Xin.

Sci Rep ; 12(1): 4751, 2022 03 19.

Article in English | MEDLINE | ID: mdl-35306525

ABSTRACT

Drug-target interaction (DTI) prediction plays a crucial role in drug repositioning and virtual drug screening. Most DTI prediction methods cast the problem as a binary classification task to predict if interactions exist or as a regression task to predict continuous values that indicate a drug's ability to bind to a specific target. The regression-based methods provide insight beyond the binary relationship. However, most of these methods require the three-dimensional (3D) structural information of targets which are still not generally available to the targets. Despite this bottleneck, only a few methods address the drug-target binding affinity (DTBA) problem from a non-structure-based approach to avoid the 3D structure limitations. Here we propose Affinity2Vec, as a novel regression-based method that formulates the entire task as a graph-based problem. To develop this method, we constructed a weighted heterogeneous graph that integrates data from several sources, including drug-drug similarity, target-target similarity, and drug-target binding affinities. Affinity2Vec further combines several computational techniques from feature representation learning, graph mining, and machine learning to generate or extract features, build the model, and predict the binding affinity between the drug and the target with no 3D structural data. We conducted extensive experiments to evaluate and demonstrate the robustness and efficiency of the proposed method on benchmark datasets used in state-of-the-art non-structured-based drug-target binding affinity studies. Affinity2Vec showed superior and competitive results compared to the state-of-the-art methods based on several evaluation metrics, including mean squared error, rm2, concordance index, and area under the precision-recall curve.

Subject(s)

Drug Development , Machine Learning , Drug Development/methods , Drug Repositioning

Application and evaluation of knowledge graph embeddings in biomedical data.

Alshahrani, Mona; Thafar, Maha A; Essack, Magbubah.

PeerJ Comput Sci ; 7: e341, 2021.

Article in English | MEDLINE | ID: mdl-33816992

ABSTRACT

Linked data and bio-ontologies enabling knowledge representation, standardization, and dissemination are an integral part of developing biological and biomedical databases. That is, linked data and bio-ontologies are employed in databases to maintain data integrity, data organization, and to empower search capabilities. However, linked data and bio-ontologies are more recently being used to represent information as multi-relational heterogeneous graphs, "knowledge graphs". The reason being, entities and relations in the knowledge graph can be represented as embedding vectors in semantic space, and these embedding vectors have been used to predict relationships between entities. Such knowledge graph embedding methods provide a practical approach to data analytics and increase chances of building machine learning models with high prediction accuracy that can enhance decision support systems. Here, we present a comparative assessment and a standard benchmark for knowledge graph-based representation learning methods focused on the link prediction task for biological relations. We systematically investigated and compared state-of-the-art embedding methods based on the design settings used for training and evaluation. We further tested various strategies aimed at controlling the amount of information related to each relation in the knowledge graph and its effects on the final performance. We also assessed the quality of the knowledge graph features through clustering and visualization and employed several evaluation metrics to examine their uses and differences. Based on this systematic comparison and assessments, we identify and discuss the limitations of knowledge graph-based representation learning methods and suggest some guidelines for the development of more improved methods.

Assessing the Outcome of Adult Kidney Transplantation from a Deceased Expanded Criteria Donor: A Descriptive Study.

Alshahrani, Mona; Alotaibi, Mutlaq; Bhutto, Burhan.

Cureus ; 12(10): e11199, 2020 Oct 27.

Article in English | MEDLINE | ID: mdl-33269130

ABSTRACT

Background End-stage renal disease (ESRD) creates a great burden on the quality of life. Patients after kidney transplantation have been reported to have a greater quality of life and better outcomes health outcomes. Therefore, it is important to optimize the best method of following well-constructed criteria such as the expanded criteria donor (ECD) to reduce the chances of rejection rate and deaths post-transplantation particularly in elderly patients in conjunction with the kidney profile donor index (KDPI). Methods This is a retrospective descriptive study of all patients who received kidney transplantation from a deceased donor from the ECD as well as ECD with donation after cardiac death (DCD) at St. Joseph Health Care Hospital over a 24 month time period from January 2017 to January 2019. All adult recipients from standard criteria donor (SCD) and living donors were excluded from the study. Results The study included 60 patients with 36 (60%) from the ECD and 24 (40%) were from the ECD/DCD group. The most common cause of ESRD among recipients was diabetes mellitus (DM) involving 23 (38.3%) of the patients. The creatinine outcome was the highest in the ECD/DCD group at one month (211 ± 71) and the lowest creatinine recorded was also in the ECD/DCD at 12 months (160 ± 78). Lastly, only four patients died in 12 months and only six recipients reported graft loss over 12 months. Conclusion Descriptive data of the included ECD/DCD showed increase trend in survivability of the recipients when used among the elderly, giving us more insight on the benefits of ECD/DCD transplantation.

Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes.

Alshahrani, Mona; Hoehndorf, Robert.

Bioinformatics ; 34(17): i901-i907, 2018 09 01.

Article in English | MEDLINE | ID: mdl-30423077

ABSTRACT

Motivation: In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease's (or patient's) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results: We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network. Availability and implementation: https://github.com/bio-ontology-research-group/SmuDGE.

Subject(s)

Disease/genetics , Animals , Humans , Mice , Phenotype , Semantics , Software

Neuro-symbolic representation learning on biological knowledge graphs.

Alshahrani, Mona; Khan, Mohammad Asif; Maddouri, Omar; Kinjo, Akira R; Queralt-Rosinach, Núria; Hoehndorf, Robert.

Bioinformatics ; 33(17): 2723-2730, 2017 Sep 01.

Article in English | MEDLINE | ID: mdl-28449114

ABSTRACT

MOTIVATION: Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. AVAILABILITY AND IMPLEMENTATION: https://github.com/bio-ontology-research-group/walking-rdf-and-owl. CONTACT: robert.hoehndorf@kaust.edu.sa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Computational Biology/methods , Knowledge Bases , Machine Learning , Neural Networks, Computer , Humans

The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants.

Hoehndorf, Robert; Alshahrani, Mona; Gkoutos, Georgios V; Gosline, George; Groom, Quentin; Hamann, Thomas; Kattge, Jens; de Oliveira, Sylvia Mota; Schmidt, Marco; Sierra, Soraya; Smets, Erik; Vos, Rutger A; Weiland, Claus.

J Biomed Semantics ; 7(1): 65, 2016 11 14.

Article in English | MEDLINE | ID: mdl-27842607

ABSTRACT

BACKGROUND: The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text. RESULTS: We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities. CONCLUSIONS: The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established vocabularies or ontologies, but rather serve as an overarching framework based on which different application- and domain-specific ontologies, thesauri and vocabularies of phenotypes observed in flowering plants can be integrated.

Subject(s)

Biological Ontologies , Phenotype , Plants/anatomy & histology , Plants/genetics

Building predictive models for MERS-CoV infections using data mining techniques.

Al-Turaiki, Isra; Alshahrani, Mona; Almutairi, Tahani.

J Infect Public Health ; 9(6): 744-748, 2016.

Article in English | MEDLINE | ID: mdl-27641481

ABSTRACT

BACKGROUND: Recently, the outbreak of MERS-CoV infections caused worldwide attention to Saudi Arabia. The novel virus belongs to the coronaviruses family, which is responsible for causing mild to moderate colds. The control and command center of Saudi Ministry of Health issues a daily report on MERS-CoV infection cases. The infection with MERS-CoV can lead to fatal complications, however little information is known about this novel virus. In this paper, we apply two data mining techniques in order to better understand the stability and the possibility of recovery from MERS-CoV infections. METHOD: The Naive Bayes classifier and J48 decision tree algorithm were used to build our models. The dataset used consists of 1082 records of cases reported between 2013 and 2015. In order to build our prediction models, we split the dataset into two groups. The first group combined recovery and death records. A new attribute was created to indicate the record type, such that the dataset can be used to predict the recovery from MERS-CoV. The second group contained the new case records to be used to predict the stability of the infection based on the current status attribute. RESULTS: The resulting recovery models indicate that healthcare workers are more likely to survive. This could be due to the vaccinations that healthcare workers are required to get on regular basis. As for the stability models using J48, two attributes were found to be important for predicting stability: symptomatic and age. Old patients are at high risk of developing MERS-CoV complications. Finally, the performance of all the models was evaluated using three measures: accuracy, precision, and recall. In general, the accuracy of the models is between 53.6% and 71.58%. CONCLUSION: We believe that the performance of the prediction models can be enhanced with the use of more patient data. As future work, we plan to directly contact hospitals in Riyadh in order to collect more information related to patients with MERS-CoV infections.

Subject(s)

Computer Simulation , Coronavirus Infections/mortality , Coronavirus Infections/pathology , Aged , Aged, 80 and over , Data Mining , Female , Humans , Male , Middle Aged , Prognosis , Saudi Arabia , Survival Analysis , Treatment Outcome

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL