RESUMO
CD4+ T cells are central to various immune responses, but the molecular programs that drive and maintain CD4+ T cell immunity are not entirely clear. Here we identify a stem-like program that governs the CD4+ T cell response in transplantation models. Single-cell-transcriptomic analysis revealed that naive alloantigen-specific CD4+ T cells develop into TCF1hi effector precursor (TEP) cells and TCF1-CXCR6+ effectors in transplant recipients. The TCF1-CXCR6+CD4+ effectors lose proliferation capacity and do not reject allografts upon adoptive transfer into secondary hosts. By contrast, the TCF1hiCD4+ TEP cells have dual features of self-renewal and effector differentiation potential, and allograft rejection depends on continuous replenishment of TCF1-CXCR6+ effectors from TCF1hiCD4+ TEP cells. Mechanistically, TCF1 sustains the CD4+ TEP cell population, whereas the transcription factor IRF4 and the glycolytic enzyme LDHA govern the effector differentiation potential of CD4+ TEP cells. Deletion of IRF4 or LDHA in T cells induces transplant acceptance. These findings unravel a stem-like program that controls the self-renewal capacity and effector differentiation potential of CD4+ TEP cells and have implications for T cell-related immunotherapies.
Assuntos
Regulação da Expressão Gênica , Linfócitos T Reguladores , Diferenciação CelularRESUMO
Iron antimonide (FeSb2) has been investigated for decades due to its puzzling electronic properties. It undergoes the temperature-controlled transition from an insulator to an ill-defined metal, with a cross-over from diamagnetism to paramagnetism. Extensive efforts have been made to uncover the underlying mechanism, but a consensus has yet to be reached. While macroscopic transport and magnetic measurements can be explained by different theoretical proposals, the essential spectroscopic evidence required to distinguish the physical origin is missing. In this paper, through the use of X-ray absorption spectroscopy and atomic multiplet simulations, we have observed the mixed spin states of 3d 6 configuration in FeSb2. Furthermore, we reveal that the enhancement of the conductivity, whether induced by temperature or doping, is characterized by populating the high-spin state from the low-spin state. Our work constitutes vital spectroscopic evidence that the electrical/magnetical transition in FeSb2 is directly associated with the spin-state excitation.
RESUMO
Drug-target interactions (DTIs) are a key part of drug development process and their accurate and efficient prediction can significantly boost development efficiency and reduce development time. Recent years have witnessed the rapid advancement of deep learning, resulting in an abundance of deep learning-based models for DTI prediction. However, most of these models used a single representation of drugs and proteins, making it difficult to comprehensively represent their characteristics. Multimodal data fusion can effectively compensate for the limitations of single-modal data. However, existing multimodal models for DTI prediction do not take into account both intra- and inter-modal interactions simultaneously, resulting in limited presentation capabilities of fused features and a reduction in DTI prediction accuracy. A hierarchical multimodal self-attention-based graph neural network for DTI prediction, called HMSA-DTI, is proposed to address multimodal feature fusion. Our proposed HMSA-DTI takes drug SMILES, drug molecular graphs, protein sequences and protein 2-mer sequences as inputs, and utilizes a hierarchical multimodal self-attention mechanism to achieve deep fusion of multimodal features of drugs and proteins, enabling the capture of intra- and inter-modal interactions between drugs and proteins. It is demonstrated that our proposed HMSA-DTI has significant advantages over other baseline methods on multiple evaluation metrics across five benchmark datasets.
Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Proteínas/química , Proteínas/metabolismo , Humanos , Algoritmos , Biologia Computacional/métodosRESUMO
Single-cell RNA sequencing (scRNA-seq) enables the exploration of biological heterogeneity among different cell types within tissues at a resolution. Inferring cell types within tissues is foundational for downstream research. Most existing methods for cell type inference based on scRNA-seq data primarily utilize highly variable genes (HVGs) with higher expression levels as clustering features, overlooking the contribution of HVGs with lower expression levels. To address this, we have designed a novel cell type inference method for scRNA-seq data, termed scLEGA. scLEGA employs a novel zero-inflated negative binomial (ZINB) loss function that fully considers the contribution of genes with lower expression levels and combines two distinct scRNA-seq clustering strategies through a multi-head attention mechanism. It utilizes a low-expression optimized denoising autoencoder, based on the novel ZINB model, to extract low-dimensional features and handle dropout events, and a GCN-based graph autoencoder (GAE) that leverages neighbor information to guide dimensionality reduction. The iterative fusion of denoising and topological embedding in scLEGA facilitates the acquisition of cluster-friendly cell representations in the hidden embedding, where similar cells are brought closer together. Compared to 12 state-of-the-art cell type inference methods on 15 scRNA-seq datasets, scLEGA demonstrates superior performance in clustering accuracy, scalability, and stability. Our scLEGA model codes are freely available at https://github.com/Masonze/scLEGA-main.
Assuntos
RNA-Seq , Análise da Expressão Gênica de Célula Única , Humanos , Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , RNA-Seq/métodos , SoftwareRESUMO
In the growth and development of multicellular organisms, the immune processes of the immune system and the maintenance of the organism's internal environment, cell communication plays a crucial role. It exerts a significant influence on regulating internal cellular states such as gene expression and cell functionality. Currently, the mainstream methods for studying intercellular communication are focused on exploring the ligand-receptor-transcription factor and ligand-receptor-subunit scales. However, there is relatively limited research on the association between intercellular communication and highly variable genes (HVGs). As some HVGs are closely related to cell communication, accurately identifying these HVGs can enhance the accuracy of constructing cell communication networks. The rapid development of single-cell sequencing (scRNA-seq) and spatial transcriptomics technologies provides a data foundation for exploring the relationship between intercellular communication and HVGs. Therefore, we propose CPPLS-MLP, which can identify HVGs closely related to intercellular communication and further analyze the impact of Multiple Input Multiple Output cellular communication on the differential expression of these HVGs. By comparing with the commonly used method CCPLS for constructing intercellular communication networks, we validated the superior performance of our method in identifying cell-type-specific HVGs and effectively analyzing the influence of neighboring cell types on HVG expression regulation. Source codes for the CPPLS_MLP R, python packages and the related scripts are available at 'CPPLS_MLP Github [https://github.com/wuzhenao/CPPLS-MLP]'.
Assuntos
Comunicação Celular , Análise de Célula Única , Análise de Célula Única/métodos , Transcriptoma , Perfilação da Expressão Gênica/métodos , Humanos , Biologia Computacional/métodos , Redes Reguladoras de Genes , Animais , Software , AlgoritmosRESUMO
Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.
Assuntos
Metagenoma , Microbiota , Microbiota/genética , BenchmarkingRESUMO
Natural products (NPs) are indispensable in drug development, particularly in combating infections, cancer, and neurodegenerative diseases. However, their limited availability poses significant challenges. Template-free de novo biosynthetic pathway design provides a strategic solution for NP production, with deep learning standing out as a powerful tool in this domain. This review delves into state-of-the-art deep learning algorithms in NP biosynthesis pathway design. It provides an in-depth discussion of databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and UniProt, which are essential for model training, along with chemical databases such as Reaxys, SciFinder, and PubChem for transfer learning to expand models' understanding of the broader chemical space. It evaluates the potential and challenges of sequence-to-sequence and graph-to-graph translation models for accurate single-step prediction. Additionally, it discusses search algorithms for multistep prediction and deep learning algorithms for predicting enzyme function. The review also highlights the pivotal role of deep learning in improving catalytic efficiency through enzyme engineering, which is essential for enhancing NP production. Moreover, it examines the application of large language models in pathway design, enzyme discovery, and enzyme engineering. Finally, it addresses the challenges and prospects associated with template-free approaches, offering insights into potential advancements in NP biosynthesis pathway design.
Assuntos
Produtos Biológicos , Vias Biossintéticas , Aprendizado Profundo , Produtos Biológicos/metabolismo , Algoritmos , Biologia Computacional/métodos , HumanosRESUMO
The field of computational drug repurposing aims to uncover novel therapeutic applications for existing drugs through high-throughput data analysis. However, there is a scarcity of drug repurposing methods leveraging the cellular-level information provided by single-cell RNA sequencing data. To address this need, we propose DrugReSC, an innovative approach to drug repurposing utilizing single-cell RNA sequencing data, intending to target specific cell subpopulations critical to disease pathology. DrugReSC constructs a drug-by-cell matrix representing the transcriptional relationships between individual cells and drugs and utilizes permutation-based methods to assess drug contributions to cellular phenotypic changes. We demonstrate DrugReSC's superior performance compared to existing drug repurposing methods based on bulk or single-cell RNA sequencing data across multiple cancer case studies. In summary, DrugReSC offers a novel perspective on the utilization of single-cell sequencing data in drug repurposing methods, contributing to the advancement of precision medicine for cancer.
Assuntos
Reposicionamento de Medicamentos , Neoplasias , Análise de Célula Única , Transcriptoma , Reposicionamento de Medicamentos/métodos , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Neoplasias/patologia , Neoplasias/metabolismo , Análise de Célula Única/métodos , Biologia Computacional/métodos , Análise de Sequência de RNA/métodos , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêuticoRESUMO
Bacteriophages are the viruses that infect bacterial cells. They are the most diverse biological entities on earth and play important roles in microbiome. According to the phage lifestyle, phages can be divided into the virulent phages and the temperate phages. Classifying virulent and temperate phages is crucial for further understanding of the phage-host interactions. Although there are several methods designed for phage lifestyle classification, they merely either consider sequence features or gene features, leading to low accuracy. A new computational method, DeePhafier, is proposed to improve classification performance on phage lifestyle. Built by several multilayer self-attention neural networks, a global self-attention neural network, and being combined by protein features of the Position Specific Scoring Matrix matrix, DeePhafier improves the classification accuracy and outperforms two benchmark methods. The accuracy of DeePhafier on five-fold cross-validation is as high as 87.54% for sequences with length >2000bp.
Assuntos
Bacteriófagos , Redes Neurais de Computação , Bacteriófagos/genética , Biologia Computacional/métodos , Proteínas Virais/genética , Proteínas Virais/metabolismo , AlgoritmosRESUMO
Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.
Assuntos
Algoritmos , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Variação Estrutural do Genoma , SoftwareRESUMO
While significant strides have been made in predicting neoepitopes that trigger autologous CD4+ T cell responses, accurately identifying the antigen presentation by human leukocyte antigen (HLA) class II molecules remains a challenge. This identification is critical for developing vaccines and cancer immunotherapies. Current prediction methods are limited, primarily due to a lack of high-quality training epitope datasets and algorithmic constraints. To predict the exogenous HLA class II-restricted peptides across most of the human population, we utilized the mass spectrometry data to profile >223 000 eluted ligands over HLA-DR, -DQ, and -DP alleles. Here, by integrating these data with peptide processing and gene expression, we introduce HLAIImaster, an attention-based deep learning framework with adaptive domain knowledge for predicting neoepitope immunogenicity. Leveraging diverse biological characteristics and our enhanced deep learning framework, HLAIImaster is significantly improved against existing tools in terms of positive predictive value across various neoantigen studies. Robust domain knowledge learning accurately identifies neoepitope immunogenicity, bridging the gap between neoantigen biology and the clinical setting and paving the way for future neoantigen-based therapies to provide greater clinical benefit. In summary, we present a comprehensive exploitation of the immunogenic neoepitope repertoire of cancers, facilitating the effective development of "just-in-time" personalized vaccines.
Assuntos
Aprendizado Profundo , Antígenos de Histocompatibilidade Classe II , Humanos , Antígenos de Histocompatibilidade Classe II/imunologia , Epitopos/imunologia , Biologia Computacional/métodos , Epitopos de Linfócito T/imunologiaRESUMO
The single-cell proteomics enables the direct quantification of protein abundance at the single-cell resolution, providing valuable insights into cellular phenotypes beyond what can be inferred from transcriptome analysis alone. However, insufficient large-scale integrated databases hinder researchers from accessing and exploring single-cell proteomics, impeding the advancement of this field. To fill this deficiency, we present a comprehensive database, namely Single-cell Proteomic DataBase (SPDB, https://scproteomicsdb.com/), for general single-cell proteomic data, including antibody-based or mass spectrometry-based single-cell proteomics. Equipped with standardized data process and a user-friendly web interface, SPDB provides unified data formats for convenient interaction with downstream analysis, and offers not only dataset-level but also protein-level data search and exploration capabilities. To enable detailed exhibition of single-cell proteomic data, SPDB also provides a module for visualizing data from the perspectives of cell metadata or protein features. The current version of SPDB encompasses 133 antibody-based single-cell proteomic datasets involving more than 300 million cells and over 800 marker/surface proteins, and 10 mass spectrometry-based single-cell proteomic datasets involving more than 4000 cells and over 7000 proteins. Overall, SPDB is envisioned to be explored as a useful resource that will facilitate the wider research communities by providing detailed insights into proteomics from the single-cell perspective.
Assuntos
Proteínas , Proteômica , Anticorpos , Bases de Conhecimento , Espectrometria de Massas , Humanos , Animais , Análise de Célula ÚnicaRESUMO
Retinal Müller glia (MG) can act as stem-like cells to generate new neurons in both zebrafish and mice. In zebrafish, retinal regeneration is innate and robust, resulting in the replacement of lost neurons and restoration of visual function. In mice, exogenous stimulation of MG is required to reveal a dormant and, to date, limited regenerative capacity. Zebrafish studies have been key in revealing factors that promote regenerative responses in the mammalian eye. Increased understanding of how the regenerative potential of MG is regulated in zebrafish may therefore aid efforts to promote retinal repair therapeutically. Developmental signaling pathways are known to coordinate regeneration following widespread retinal cell loss. In contrast, less is known about how regeneration is regulated in the context of retinal degenerative disease, i.e., following the loss of specific retinal cell types. To address this knowledge gap, we compared transcriptomic responses underlying regeneration following targeted loss of rod photoreceptors or bipolar cells. In total, 2,531 differentially expressed genes (DEGs) were identified, with the majority being paradigm specific, including during early MG activation phases, suggesting the nature of the injury/cell loss informs the regenerative process from initiation onward. For example, early modulation of Notch signaling was implicated in the rod but not bipolar cell ablation paradigm and components of JAK/STAT signaling were implicated in both paradigms. To examine candidate gene roles in rod cell regeneration, including several immune-related factors, CRISPR/Cas9 was used to create G0 mutant larvae (i.e., "crispants"). Rod cell regeneration was inhibited in stat3 crispants, while mutating stat5a/b, c7b and txn accelerated rod regeneration kinetics. These data support emerging evidence that discrete responses follow from selective retinal cell loss and that the immune system plays a key role in regulating "fate-biased" regenerative processes.
Assuntos
Transcriptoma , Peixe-Zebra , Animais , Camundongos , Peixe-Zebra/genética , Animais Geneticamente Modificados , Transcriptoma/genética , Retina/metabolismo , Neurônios , Proliferação de Células , MamíferosRESUMO
Accurate prediction of promoter regions driving miRNA gene expression has become a major challenge due to the lack of annotation information for pri-miRNA transcripts. This defect hinders our understanding of miRNA-mediated regulatory networks. Some algorithms have been designed during the past decade to detect miRNA promoters. However, these methods rely on biosignal data such as CpG islands and still need to be improved. Here, we propose miProBERT, a BERT-based model for predicting promoters directly from gene sequences without using any structural or biological signals. According to our information, it is the first time a BERT-based model has been employed to identify miRNA promoters. We use the pre-trained model DNABERT, fine-tune the pre-trained model on the gene promoter dataset so that the model includes information about the richer biological properties of promoter sequences in its representation, and then systematically scan the upstream regions of each intergenic miRNA using the fine-tuned model. About, 665 miRNA promoters are found. The innovative use of a random substitution strategy to construct a negative dataset improves the discriminative ability of the model and further reduces the false positive rate (FPR) to as low as 0.0421. On independent datasets, miProBERT outperformed other gene promoter prediction methods. With comparison on 33 experimentally validated miRNA promoter datasets, miProBERT significantly outperformed previously developed miRNA promoter prediction programs with 78.13% precision and 75.76% recall. We further verify the predicted promoter regions by analyzing conservation, CpG content and histone marks. The effectiveness and robustness of miProBERT are highlighted.
Assuntos
MicroRNAs , MicroRNAs/metabolismo , Regiões Promotoras Genéticas , Algoritmos , Ilhas de CpGRESUMO
Identifying gene regulatory networks (GRNs) at the resolution of single cells has long been a great challenge, and the advent of single-cell multi-omics data provides unprecedented opportunities to construct GRNs. Here, we propose a novel strategy to integrate omics datasets of single-cell ribonucleic acid sequencing and single-cell Assay for Transposase-Accessible Chromatin using sequencing, and using an unsupervised learning neural network to divide the samples with high copy number variation scores, which are used to infer the GRN in each gene block. Accuracy validation of proposed strategy shows that approximately 80% of transcription factors are directly associated with cancer, colorectal cancer, malignancy and disease by TRRUST; and most transcription factors are prone to produce multiple transcript variants and lead to tumorigenesis by RegNetwork database, respectively. The source code access are available at: https://github.com/Cuily-v/Colorectal_cancer.
Assuntos
Neoplasias Colorretais , Redes Reguladoras de Genes , Humanos , Multiômica , Variações do Número de Cópias de DNA , Algoritmos , Fatores de Transcrição/genética , Neoplasias Colorretais/genéticaRESUMO
Identifying disease-gene associations is a fundamental and critical biomedical task towards understanding molecular mechanisms, the diagnosis and treatment of diseases. It is time-consuming and expensive to experimentally verify causal links between diseases and genes. Recently, deep learning methods have achieved tremendous success in identifying candidate genes for genetic diseases. The gene prediction problem can be modeled as a link prediction problem based on the features of nodes and edges of the gene-disease graph. However, most existing researches either build homogeneous networks based on one single data source or heterogeneous networks based on multi-source data, and artificially define meta-paths, so as to learn the network representation of diseases and genes. The former cannot make use of abundant multi-source heterogeneous information, while the latter needs domain knowledge and experience when defining meta-paths, and the accuracy of the model largely depends on the definition of meta-paths. To address the aforementioned challenges above bottlenecks, we propose an end-to-end disease-gene association prediction model with parallel graph transformer network (DGP-PGTN), which deeply integrates the heterogeneous information of diseases, genes, ontologies and phenotypes. DGP-PGTN can automatically and comprehensively capture the multiple latent interactions between diseases and genes, discover the causal relationship between them and is fully interpretable at the same time. We conduct comprehensive experiments and show that DGP-PGTN outperforms the state-of-the-art methods significantly on the task of disease-gene association prediction. Furthermore, DGP-PGTN can automatically learn the implicit relationship between diseases and genes without manually defining meta paths.
Assuntos
Algoritmos , Redes Neurais de Computação , FenótipoRESUMO
Accurate and effective drug-target interaction (DTI) prediction can greatly shorten the drug development lifecycle and reduce the cost of drug development. In the deep-learning-based paradigm for predicting DTI, robust drug and protein feature representations and their interaction features play a key role in improving the accuracy of DTI prediction. Additionally, the class imbalance problem and the overfitting problem in the drug-target dataset can also affect the prediction accuracy, and reducing the consumption of computational resources and speeding up the training process are also critical considerations. In this paper, we propose shared-weight-based MultiheadCrossAttention, a precise and concise attention mechanism that can establish the association between target and drug, making our models more accurate and faster. Then, we use the cross-attention mechanism to construct two models: MCANet and MCANet-B. In MCANet, the cross-attention mechanism is used to extract the interaction features between drugs and proteins for improving the feature representation ability of drugs and proteins, and the PolyLoss loss function is applied to alleviate the overfitting problem and the class imbalance problem in the drug-target dataset. In MCANet-B, the robustness of the model is improved by combining multiple MCANet models and prediction accuracy further increases. We train and evaluate our proposed methods on six public drug-target datasets and achieve state-of-the-art results. In comparison with other baselines, MCANet saves considerable computational resources while maintaining accuracy in the leading position; however, MCANet-B greatly improves prediction accuracy by combining multiple models while maintaining a balance between computational resource consumption and prediction accuracy.
Assuntos
Desenvolvimento de Medicamentos , Descoberta de Drogas , Descoberta de Drogas/métodos , Proteínas/metabolismo , Sistemas de Liberação de Medicamentos , Domínios ProteicosRESUMO
Silencers are noncoding DNA sequence fragments located on the genome that suppress gene expression. The variation of silencers in specific cells is closely related to gene expression and cancer development. Computational approaches that exclusively rely on DNA sequence information for silencer identification fail to account for the cell specificity of silencers, resulting in diminished accuracy. Despite the discovery of several transcription factors and epigenetic modifications associated with silencers on the genome, there is still no definitive biological signal or combination thereof to fully characterize silencers, posing challenges in selecting suitable biological signals for their identification. Therefore, we propose a sophisticated deep learning framework called DeepICSH, which is based on multiple biological data sources. Specifically, DeepICSH leverages a deep convolutional neural network to automatically capture biologically relevant signal combinations strongly associated with silencers, originating from a diverse array of biological signals. Furthermore, the utilization of attention mechanisms facilitates the scoring and visualization of these signal combinations, whereas the employment of skip connections facilitates the fusion of multilevel sequence features and signal combinations, thereby empowering the accurate identification of silencers within specific cells. Extensive experiments on HepG2 and K562 cell line data sets demonstrate that DeepICSH outperforms state-of-the-art methods in silencer identification. Notably, we introduce for the first time a deep learning framework based on multi-omics data for classifying strong and weak silencers, achieving favorable performance. In conclusion, DeepICSH shows great promise for advancing the study and analysis of silencers in complex diseases. The source code is available at https://github.com/lyli1013/DeepICSH.
Assuntos
Aprendizado Profundo , Genoma Humano , Humanos , Linhagem Celular , Epigênese Genética , MultiômicaRESUMO
With the emergence of spatial transcriptome sequencing (ST-seq), research now heavily relies on the joint analysis of ST-seq and single-cell RNA sequencing (scRNA-seq) data to precisely identify cell spatial composition in tissues. However, common methods for combining these datasets often merge data from multiple cells to generate pseudo-ST data, overlooking topological relationships and failing to represent spatial arrangements accurately. We introduce GTAD, a method utilizing the Graph Attention Network for deconvolution of integrated scRNA-seq and ST-seq data. GTAD effectively captures cell spatial relationships and topological structures within tissues using a graph-based approach, enhancing cell-type identification and our understanding of complex tissue cellular landscapes. By integrating scRNA-seq and ST data into a unified graph structure, GTAD outperforms traditional 'pseudo-ST' methods, providing robust and information-rich results. GTAD performs exceptionally well with synthesized spatial data and accurately identifies cell spatial composition in tissues like the mouse cerebral cortex, cerebellum, developing human heart and pancreatic ductal carcinoma. GTAD holds the potential to enhance our understanding of tissue microenvironments and cellular diversity in complex bio-logical systems. The source code is available at https://github.com/zzhjs/GTAD.
Assuntos
Análise da Expressão Gênica de Célula Única , Software , Humanos , Animais , CamundongosRESUMO
MOTIVATION: The prediction of drug-target interaction is a vital task in the biomedical field, aiding in the discovery of potential molecular targets of drugs and the development of targeted therapy methods with higher efficacy and fewer side effects. Although there are various methods for drug-target interaction (DTI) prediction based on heterogeneous information networks, these methods face challenges in capturing the fundamental interaction between drugs and targets and ensuring the interpretability of the model. Moreover, they need to construct meta-paths artificially or a lot of feature engineering (prior knowledge), and graph generation can fuse information more flexibly without meta-path selection. RESULTS: We propose a causal enhanced method for drug-target interaction (CE-DTI) prediction that integrates graph generation and multi-source information fusion. First, we represent drugs and targets by modeling the fusion of their multi-source information through automatic graph generation. Once drugs and targets are combined, a network of drug-target pairs is constructed, transforming the prediction of drug-target interactions into a node classification problem. Specifically, the influence of surrounding nodes on the central node is separated into two groups: causal and non-causal variable nodes. Causal variable nodes significantly impact the central node's classification, while non-causal variable nodes do not. Causal invariance is then used to enhance the contrastive learning of the drug-target pairs network. Our method demonstrates excellent performance compared with other competitive benchmark methods across multiple datasets. At the same time, the experimental results also show that the causal enhancement strategy can explore the potential causal effects between DTPs, and discover new potential targets. Additionally, case studies demonstrate that this method can identify potential drug targets. AVAILABILITY AND IMPLEMENTATION: The source code of AdaDR is available at: https://github.com/catly/CE-DTI.