RESUMEN
Introduction: Pulmonary hypertension (PH) is a pathological condition that affects approximately 1% of the population. The prognosis for many patients is poor, even after treatment. Our knowledge about the pathophysiological mechanisms that cause or are involved in the progression of PH is incomplete. Additionally, the mechanism of action of many drugs used to treat pulmonary hypertension, including sotatercept, requires elucidation. Methods: Using our graph-powered knowledge mining software Lifelike in combination with a very small patient metabolite data set, we demonstrate how we derive detailed mechanistic hypotheses on the mechanisms of PH pathophysiology and clinical drugs. Results: In PH patients, the concentration of hypoxanthine, 12(S)-HETE, glutamic acid, and sphingosine 1 phosphate is significantly higher, while the concentration of L-arginine and L-histidine is lower than in healthy controls. Using the graph-based data analysis, gene ontology, and semantic association capabilities of Lifelike, led us to connect the differentially expressed metabolites with G-protein signaling and SRC. Then, we associated SRC with IL6 signaling. Subsequently, we found associations that connect SRC, and IL6 to activin and BMP signaling. Lastly, we analyzed the mechanisms of action of several existing and novel pharmacological treatments for PH. Lifelike elucidated the interplay between G-protein, IL6, activin, and BMP signaling. Those pathways regulate hallmark pathophysiological processes of PH, including vasoconstriction, endothelial barrier function, cell proliferation, and apoptosis. Discussion: The results highlight the importance of SRC, ERK1, AKT, and MLC activity in PH. The molecular pathways affected by existing and novel treatments for PH also converge on these molecules. Importantly, sotatercept affects SRC, ERK1, AKT, and MLC simultaneously. The present study shows the power of mining knowledge graphs using Lifelike's diverse set of data analytics functionalities for developing knowledge-driven hypotheses on PH pathophysiological and drug mechanisms and their interactions. We believe that Lifelike and our presented approach will be valuable for future mechanistic studies of PH, other diseases, and drugs.
RESUMEN
Glycosylation represents a major chemical challenge; while it is one of the most common reactions in Nature, conventional chemistry struggles with stereochemistry, regioselectivity, and solubility issues. In contrast, family 1 glycosyltransferase (GT1) enzymes can glycosylate virtually any given nucleophilic group with perfect control over stereochemistry and regioselectivity. However, the appropriate catalyst for a given reaction needs to be identified among the tens of thousands of available sequences. Here, we present the glycosyltransferase acceptor specificity predictor (GASP) model, a data-driven approach to the identification of reactive GT1:acceptor pairs. We trained a random forest-based acceptor predictor on literature data and validated it on independent in-house generated data on 1001 GT1:acceptor pairs, obtaining an AUROC of 0.79 and a balanced accuracy of 72%. The performance was stable even in the case of completely new GT1s and acceptors not present in the training data set, highlighting the pan-specificity of GASP. Moreover, the model is capable of parsing all known GT1 sequences, as well as all chemicals, the latter through a pipeline for the generation of 153 chemical features for a given molecule taking the CID or SMILES as input (freely available at https://github.com/degnbol/GASP). To investigate the power of GASP, the model prediction probability scores were compared to GT1 substrate conversion yields from a newly published data set, with the top 50% of GASP predictions corresponding to reactions with >50% synthetic yields. The model was also tested in two comparative case studies: glycosylation of the antihelminth drug niclosamide and the plant defensive compound DIBOA. In the first study, the model achieved an 83% hit rate, outperforming a hit rate of 53% from a random selection assay. In the second case study, the hit rate of GASP was 50%, and while being lower than the hit rate of 83% using expert-selected enzymes, it provides a reasonable performance for the cases when an expert opinion is unavailable. The hierarchal importance of the generated chemical features was investigated by negative feature selection, revealing properties related to cyclization and atom hybridization status to be the most important characteristics for accurate prediction. Our study provides a GT1:acceptor predictor which can be trained on other data sets enabled by the automated feature generation pipelines. We also release the new in-house generated data set used for testing of GASP to facilitate the future development of GT1 activity predictors and their robust benchmarking.
RESUMEN
The original version of this Comment contained errors in the legend of Figure 2, in which the locations of the fifteenth and sixteenth GBA members were incorrectly given as '(15) Australian Genome Foundry, Macquarie University; (16) Australian Foundry for Advanced Biomanufacturing, University of Queensland.'. The correct version replaces this with '(15) Australian Foundry for Advanced Biomanufacturing (AusFAB), University of Queensland and (16) Australian Genome Foundry, Macquarie University'. This has been corrected in both the PDF and HTML versions of the Comment.