RESUMEN
Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma (NHL) and is characterized by high heterogeneity. Assessment of its prognosis and genetic subtyping hold significant clinical implications. However, existing DLBCL prognostic models are mainly based on transcriptomic profiles, while genetic variation detection is more commonly used in clinical practice. In addition, current clustering-based subtyping methods mostly focus on genes with high mutation frequencies, providing insufficient explanations for the heterogeneity of DLBCL. Here, we proposed VNNSurv (https://bio-web1.nscc-gz.cn/app/VNNSurv), a survival model for DLBCL patients based on a biologically informed visible neural network (VNN). VNNSurv achieved an average C-index of 0.72 on the cross-validation set (HMRN cohort, n = 928), outperforming the baseline methods. The remarkable interpretability of VNNSurv facilitated the identification of the most impactful genes and the underlying pathways through which they act on patient outcomes. When only the 30 highest-impact genes were used as genetic input, the overall performance of VNNSurv improved, and a C-index of 0.70 was achieved on the external TCGA cohort (n = 48). Leveraging these high-impact genes, including 16 genes with low (<5 %) alteration frequencies, we devised a genetic-based prognostic index (GPI) for risk stratification and a subtype identification method. We stratified the patient group according to the International Prognostic Index (IPI) into three risk grades with significant prognostic differences. Furthermore, the defined subtypes exhibited greater prognostic consistency than clustering-based methods. Broadly, VNNSurv is a valuable DLBCL survival model. Its high interpretability has significant value for precision medicine, and its framework is scalable to other diseases.
RESUMEN
Constructing discriminative representations of molecules lies at the core of a number of domains such as drug discovery, chemistry, and medicine. State-of-the-art methods employ graph neural networks and self-supervised learning (SSL) to learn unlabeled data for structural representations, which can then be fine-tuned for downstream tasks. Albeit powerful, these methods are pre-trained solely on molecular structures and thus often struggle with tasks involved in intricate biological processes. Here, it is proposed to assist the learning of molecular representation by using the perturbed high-content cell microscopy images at the phenotypic level. To incorporate the cross-modal pre-training, a unified framework is constructed to align them through multiple types of contrastive loss functions, which is proven effective in the formulated novel tasks to retrieve the molecules and corresponding images mutually. More importantly, the model can infer functional molecules according to cellular images generated by genetic perturbations. In parallel, the proposed model can transfer non-trivially to molecular property predictions, and has shown great improvement over clinical outcome predictions. These results suggest that such cross-modality learning can bridge molecules and phenotype to play important roles in drug discovery.
Asunto(s)
Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Microscopía/métodos , Algoritmos , Aprendizaje AutomáticoRESUMEN
Protein functions are characterized by interactions with proteins, drugs, and other biomolecules. Understanding these interactions is essential for deciphering the molecular mechanisms underlying biological processes and developing new therapeutic strategies. Current computational methods mostly predict interactions based on either molecular network or structural information, without integrating them within a unified multi-scale framework. While a few multi-view learning methods are devoted to fusing the multi-scale information, these methods tend to rely intensively on a single scale and under-fitting the others, likely attributed to the imbalanced nature and inherent greediness of multi-scale learning. To alleviate the optimization imbalance, we present MUSE, a multi-scale representation learning framework based on a variant expectation maximization to optimize different scales in an alternating procedure over multiple iterations. This strategy efficiently fuses multi-scale information between atomic structure and molecular network scale through mutual supervision and iterative optimization. MUSE outperforms the current state-of-the-art models not only in molecular interaction (protein-protein, drug-protein, and drug-drug) tasks but also in protein interface prediction at the atomic structure scale. More importantly, the multi-scale learning framework shows potential for extension to other scales of computational drug discovery.
Asunto(s)
Biología Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Algoritmos , Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/metabolismo , Aprendizaje Automático , Interacciones Farmacológicas , Humanos , Unión ProteicaRESUMEN
Self-supervised molecular representation learning has demonstrated great promise in bridging machine learning and chemical science to accelerate the development of new drugs. Due to the limited reaction data, existing methods are mostly pretrained by augmenting the intrinsic topology of molecules without effectively incorporating chemical reaction prior information, which makes them difficult to generalize to chemical reaction-related tasks. To address this issue, we propose ReaKE, a reaction knowledge embedding framework, which formulates chemical reactions as a knowledge graph. Specifically, we constructed a chemical synthesis knowledge graph with reactants and products as nodes and reaction rules as the edges. Based on the knowledge graph, we further proposed novel contrastive learning at both molecule and reaction levels to capture the reaction-related functional group information within and between molecules. Extensive experiments demonstrate the effectiveness of ReaKE compared with state-of-the-art methods on several downstream tasks, including reaction classification, product prediction, and yield prediction.
Asunto(s)
Aprendizaje Automático , Reconocimiento de Normas Patrones AutomatizadasRESUMEN
Illuminating associations between diseases and genes can help reveal the pathogenesis of syndromes and contribute to treatments, but a large number of associations remained unexplored. To identify novel disease-gene associations, many computational methods have been developed using disease and gene-related prior knowledge. However, these methods remain of relatively inferior performance due to the limited external data sources and the inevitable noise among the prior knowledge. In this study, we have developed a new method, Self-Supervised Mutual Infomax Graph Convolution Network (MiGCN), to predict disease-gene associations under the guidance of external disease-disease and gene-gene collaborative graphs. The noises within the collaborative graphs were eliminated by maximizing the mutual information between nodes and neighbors through a graphical mutual infomax layer. In parallel, the node interactions were strengthened by a novel informative message passing layer to improve the learning ability of graph neural network. The extensive experiments showed that our model achieved performance improvement over the state-of-art method by more than 8 % on AUC. The datasets, source codes and trained models of MiGCN are available at https://github.com/biomed-AI/MiGCN.
Asunto(s)
Aprendizaje , Redes Neurales de la Computación , Humanos , Programas Informáticos , SíndromeRESUMEN
Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.
Asunto(s)
Descubrimiento de Drogas , Intuición , Humanos , AprendizajeRESUMEN
PDAC is one of the most common malignant tumors worldwide. The difficulty of early diagnosis and lack of effective treatment are the main reasons for its poor prognosis. Therefore, it is urgent to identify novel diagnostic and therapeutic targets for PDAC patients. The m7G methylation is a common type of RNA modification that plays a pivotal role in regulating tumor development. However, the correlation between m7G regulatory genes and PDAC progression remains unclear. By integrating gene expression and related clinical information of PDAC patients from TCGA and GEO cohorts, m7G binding protein NCBP2 was found to be highly expressed in PDAC patients. More importantly, PDAC patients with high NCBP2 expression had a worse prognosis. Stable NCBP2-knockdown and overexpression PDAC cell lines were constructed to further perform in-vitro and in-vivo experiments. NCBP2-knockdown significantly inhibited PDAC cell proliferation, while overexpression of NCBP2 dramatically promoted PDAC cell growth. Mechanistically, NCBP2 enhanced the translation of c-JUN, which in turn activated MEK/ERK signaling to promote PDAC progression. In conclusion, our study reveals that m7G reader NCBP2 promotes PDAC progression by activating MEK/ERK pathway, which could serve as a novel therapeutic target for PDAC patients.
RESUMEN
Cancer-associated fibroblasts (CAFs) are a kind of stromal cells in the cholangiocarcinoma (CCA) microenvironment, playing crucial roles in cancer development. However, the potential mechanisms of the interaction between CCA cells and CAFs remain obscure. This work investigated the role of circ_0020256 in CAFs activation. We proved circ_0020256 was up-regulated in CCA. High circ_0020256 expression facilitated TGF-ß1 secretion from CCA cells, which activated CAFs via the phosphorylation of Smad2/3. Mechanistically, circ_0020256 recruited EIF4A3 protein to stabilize KLF4 mRNA and upregulate its expression, then KLF4 bound to TGF-ß1 promoter and induced its transcription in CCA cells. KLF4 overexpression abrogated the inhibition of circ_0020256 silencing in TGF-ß1/Smad2/3-induced CAFs activation. Furthermore, CCA cell growth, migration, and epithelial-mesenchymal transition were favored by CAFs-secreted IL-6 via autophagy inhibition. We also found circ_0020256 accelerated CCA tumor growth in vivo. In conclusion, circ_0020256 promoted fibroblast activation to facilitate CCA progression via EIF4A3/KLF4 pathway, providing a potential intervention for CCA progression.
RESUMEN
Introduction: Immunogenic cell death (ICD) is a sort of regulated cell death (RCD) sufficient to trigger an adaptive immunological response. According to the current findings, ICD has the capacity to alter the tumor immune microenvironment by generating danger signals or damage-associated molecular patterns (DAMPs), which may contribute in immunotherapy. It would be beneficial to develop ICD-related biomarkers that classify individuals depending on how well they respond to ICD immunotherapy. Methods and results: We used consensus clustering to identify two ICD-related groupings. The ICD-high subtype was associated with favorable clinical outcomes, significant immune cell infiltration, and powerful immune response signaling activity. In addition, we developed and validated an ICD-related prognostic model for PDAC survival based on the tumor immune microenvironment. We also collected clinical and pathological data from 48 patients with PDAC, and patients with high EIF2A expression had a poor prognosis. Finally, based on ICD signatures, we developed a novel PDAC categorization method. This categorization had significant clinical implications for determining prognosis and immunotherapy. Conclusion: Our work emphasizes the connections between ICD subtype variations and alterations in the immune tumor microenvironment in PDAC. These findings may help the immune therapy-based therapies for patients with PDAC. We also created and validated an ICD-related prognostic signature, which had a substantial impact on estimating patients' overall survival times (OS).
RESUMEN
Protein function prediction is an essential task in bioinformatics which benefits disease mechanism elucidation and drug target discovery. Due to the explosive growth of proteins in sequence databases and the diversity of their functions, it remains challenging to fast and accurately predict protein functions from sequences alone. Although many methods have integrated protein structures, biological networks or literature information to improve performance, these extra features are often unavailable for most proteins. Here, we propose SPROF-GO, a Sequence-based alignment-free PROtein Function predictor, which leverages a pretrained language model to efficiently extract informative sequence embeddings and employs self-attention pooling to focus on important residues. The prediction is further advanced by exploiting the homology information and accounting for the overlapping communities of proteins with related functions through the label diffusion algorithm. SPROF-GO was shown to surpass state-of-the-art sequence-based and even network-based approaches by more than 14.5, 27.3 and 10.1% in area under the precision-recall curve on the three sub-ontology test sets, respectively. Our method was also demonstrated to generalize well on non-homologous proteins and unseen species. Finally, visualization based on the attention mechanism indicated that SPROF-GO is able to capture sequence domains useful for function prediction. The datasets, source codes and trained models of SPROF-GO are available at https://github.com/biomed-AI/SPROF-GO. The SPROF-GO web server is freely available at http://bio-web1.nscc-gz.cn/app/sprof-go.
Asunto(s)
Proteínas , Programas Informáticos , Proteínas/metabolismo , Algoritmos , Biología Computacional/métodos , Ontología de GenesRESUMEN
BACKGROUND: Tumor-associated macrophages (TAMs) play a dual role in tumors. However, the factors which drive the function of TAMs in cholangiocarcinoma remain largely undefined. METHODS: SHH signaling pathway and endoplasmic reticulum stress (ERS) indicators were detected in clinical tissues and cholangiocarcinoma cell lines. TAMs were co-cultured with cholangiocarcinoma cells under conditions of hypoxia/normoxia. Polarized TAMs were counted by flow cytometry, and TGF-ß1 levels in cell supernatants were detected by ELISA. The effects of glioma-associated oncogene GLI2 on TAMs themselves and cholangiocarcinoma cells were examined by conducting interference and overexpression assays. RESULTS: The SHH signaling pathway and ERS were both activated in tumor tissues or tumor cell lines under conditions of hypoxia. In co-culture experiments, the presence of cholangiocarcinoma cells increased the proportion of M2-polarized TAMs and the secretion of TGF-ß1 by TAMs, while knockdown of SHH expression reversed those increases. Overexpression of GLI2 in TAMS or stimulation of TAMS with Hh-Ag1.5 increased their levels of TGF-ß1 expression. Furthermore, under co-culture conditions, interference with GLI2 expression in TAMs reduced the tumor cell migration, invasion, and ER homeostasis induced by Hh-Ag1.5-pretreated TAMs. Under conditions of hypoxia, the presence of cholangiocarcinoma cells promoted the expression of GLI2 and TGF-ß1 in Tams, and in turn, TAMs inhibited the apoptosis and promoted the migration and invasion of cholangiocarcinoma cells. In vivo, an injection of cholangiocarcinoma cells plus TAMs contributed to the growth, EMT, and ER homeostasis of tumor tissue, while an injection of TAMs with GLI2 knockdown had the opposite effects. CONCLUSION: Cholangiocarcinoma cells regulated TAM polarization and TGF-ß1 secretion via a paracrine SHH signaling pathway, and in turn, TAMs promoted the growth, EMT, and ER homeostasis of cholangiocarcinoma cells via TGF-ß1.