Pesquisa | Portal de Pesquisa da BVS

1.

RPEMHC: improved prediction of MHC-peptide binding affinity by a deep learning approach based on residue-residue pair encoding.

Wang, Xuejiao; Wu, Tingfang; Jiang, Yelu; Chen, Taoning; Pan, Deng; Jin, Zhi; Xie, Jingxin; Quan, Lijun; Lyu, Qiang.

Bioinformatics ; 40(1)2024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38175759

RESUMO

MOTIVATION: Binding of peptides to major histocompatibility complex (MHC) molecules plays a crucial role in triggering T cell recognition mechanisms essential for immune response. Accurate prediction of MHC-peptide binding is vital for the development of cancer therapeutic vaccines. While recent deep learning-based methods have achieved significant performance in predicting MHC-peptide binding affinity, most of them separately encode MHC molecules and peptides as inputs, potentially overlooking critical interaction information between the two. RESULTS: In this work, we propose RPEMHC, a new deep learning approach based on residue-residue pair encoding to predict the binding affinity between peptides and MHC, which encode an MHC molecule and a peptide as a residue-residue pair map. We evaluate the performance of RPEMHC on various MHC-II-related datasets for MHC-peptide binding prediction, demonstrating that RPEMHC achieves better or comparable performance against other state-of-the-art baselines. Moreover, we further construct experiments on MHC-I-related datasets, and experimental results demonstrate that our method can work on both two MHC classes. These extensive validations have manifested that RPEMHC is an effective tool for studying MHC-peptide interactions and can potentially facilitate the vaccine development. AVAILABILITY: The source code of the method along with trained models is freely available at https://github.com/lennylv/RPEMHC.

Assuntos

Aprendizado Profundo , Ligação Proteica , Peptídeos/química , Complexo Principal de Histocompatibilidade , Antígenos de Histocompatibilidade Classe I/metabolismo

2.

CAPLA: improved prediction of protein-ligand binding affinity by a deep learning approach based on a cross-attention mechanism.

Jin, Zhi; Wu, Tingfang; Chen, Taoning; Pan, Deng; Wang, Xuejiao; Xie, Jingxin; Quan, Lijun; Lyu, Qiang.

Bioinformatics ; 39(2)2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36688724

RESUMO

MOTIVATION: Accurate and rapid prediction of protein-ligand binding affinity is a great challenge currently encountered in drug discovery. Recent advances have manifested a promising alternative in applying deep learning-based computational approaches for accurately quantifying binding affinity. The structure complementarity between protein-binding pocket and ligand has a great effect on the binding strength between a protein and a ligand, but most of existing deep learning approaches usually extracted the features of pocket and ligand by these two detached modules. RESULTS: In this work, a new deep learning approach based on the cross-attention mechanism named CAPLA was developed for improved prediction of protein-ligand binding affinity by learning features from sequence-level information of both protein and ligand. Specifically, CAPLA employs the cross-attention mechanism to capture the mutual effect of protein-binding pocket and ligand. We evaluated the performance of our proposed CAPLA on comprehensive benchmarking experiments on binding affinity prediction, demonstrating the superior performance of CAPLA over state-of-the-art baseline approaches. Moreover, we provided the interpretability for CAPLA to uncover critical functional residues that contribute most to the binding affinity through the analysis of the attention scores generated by the cross-attention mechanism. Consequently, these results indicate that CAPLA is an effective approach for binding affinity prediction and may contribute to useful help for further consequent applications. AVAILABILITY AND IMPLEMENTATION: The source code of the method along with trained models is freely available at https://github.com/lennylv/CAPLA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Ligantes , Proteínas/química , Ligação Proteica , Software

3.

The role of microstructure of extracellular proteins in dewaterability of alkaline pretreatment sludge during bioleaching.

Li, Yunbei; Quan, Lijun; Li, Jingyu; Zhang, Zhiwen; Lv, Jinghua; Fu, Chunyan; Chen, Zhiqiang.

Environ Res ; 244: 117969, 2024 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-38109956

RESUMO

Alkaline pre-treatment is known to enhance the acid production efficiency of sludge but adversely affects its dewatering performance. In this study, the improvement of sludge dewaterability by a novel bioleaching system with inoculating domesticated acidified sludge (AS) and its underlying mechanism were investigated. The results showed that although the addition of Fe2+ and the reduction of pH improved the dewatering performance of sludge, their effects were inferior to that of AS + Fe. The addition of AS and Fe2+ significantly reduced the specific resistance to filtration and capillary suction time of the sludge by 98.6 % and 95.5 %, respectively. This improvement in dewatering performance was achieved through the combined actions of bio-acidification, bio-oxidation, and bio-flocculation. Remarkably, under alkaline pH, microorganisms in AS remained active, leading to the formation of iron-based bioflocculants, along with a rapid pH decrease. These bioflocculants, in combination with protein (PN) in tightly bound extracellular polymeric substances (TB-EPS) through amide bonding, transformed TB-EPS from extractable to non-extractable form, reducing PN content from 12.1 mg g-1DS to 5.09 mg g-1DS and altering the protein's secondary structure. Consequently, the gel-like TB-EPS matrix effectively broke down, releasing cellular water and significantly enhancing sludge dewaterability.

Assuntos

Esgotos , Água , Água/química , Ferro/química , Filtração , Oxirredução , Eliminação de Resíduos Líquidos/métodos

4.

TransPPMP: predicting pathogenicity of frameshift and non-sense mutations by a Transformer based on protein features.

Nie, Liangpeng; Quan, Lijun; Wu, Tingfang; He, Ruji; Lyu, Qiang.

Bioinformatics ; 38(10): 2705-2711, 2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35561183

RESUMO

MOTIVATION: Protein structure can be severely disrupted by frameshift and non-sense mutations at specific positions in the protein sequence. Frameshift and non-sense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and non-sense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and non-sense mutations remains under-researched. RESULTS: We built a Transformer-based neural network model to predict the pathogenicity of frameshift and non-sense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations-not only can multiple self-attention heads learn local and global interactions but also functional sites with a large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short. AVAILABILITY AND IMPLEMENTATION: TransPPMP is available at https://github.com/lennylv/TransPPMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Mutação da Fase de Leitura , Software , Humanos , Mutação , Redes Neurais de Computação

5.

Identifying modifications on DNA-bound histones with joint deep learning of multiple binding sites in DNA sequence.

Li, Yan; Quan, Lijun; Zhou, Yiting; Jiang, Yelu; Li, Kailong; Wu, Tingfang; Lyu, Qiang.

Bioinformatics ; 38(17): 4070-4077, 2022 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-35809058

RESUMO

MOTIVATION: Histone modifications are epigenetic markers that impact gene expression by altering the chromatin structure or recruiting histone modifiers. Their accurate identification is key to unraveling the mechanisms by which they regulate gene expression. However, the solutions for this task can be improved by exploiting multiple relationships from dataset and exploring designs of learning models, for example jointly learning technology. RESULTS: This article proposes a deep learning-based multi-objective computational approach, iHMnBS, to identify which of the seven typical histone modifications a DNA sequence may choose to bind, and which parts of the DNA sequence bind to them. iHMnBS employs a customized dataset that allows the marking of modifications contained in histones that may bind to any position in the DNA sequence. iHMnBS tries to mine the information implicit in this richer data by means of deep neural networks. In comprehensive comparisons, iHMnBS outperforms a baseline method, and the probability of binding to modified histones assigned to a representative nucleotide of a DNA sequence can serve as a reference for biological experiments. Since the interaction between transcription factors and histone modifications has an important role in gene expression, we extracted a number of sequence patterns that may bind to transcription factors, and explored their possible impact on disease. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/lennylv/iHMnBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado Profundo , Histonas , Histonas/metabolismo , Sequência de Bases , Sítios de Ligação , DNA/química , Fatores de Transcrição/metabolismo

6.

Simultaneous Prediction of Interaction Sites on the Protein and Peptide Sides of Complexes through Multilayer Graph Convolutional Networks.

Li, Kailong; Quan, Lijun; Jiang, Yelu; Wu, Hongjie; Wu, Jian; Li, Yan; Zhou, Yiting; Wu, Tingfang; Lyu, Qiang.

J Chem Inf Model ; 63(7): 2251-2262, 2023 04 10.

Artigo em Inglês | MEDLINE | ID: mdl-36989086

RESUMO

Identifying the binding residues of protein-peptide complexes is essential for understanding protein function mechanisms and exploring drug discovery. Recently, many computational methods have been developed to predict the interaction sites of either protein or peptide. However, to our knowledge, no prediction method can simultaneously identify the interaction sites on both the protein and peptide sides. Here, we propose a deep graph convolutional network (GCN)-based method called GraphPPepIS to predict the interaction sites of protein-peptide complexes using protein and peptide structural information. We also propose a companion method, SeqPPepIS, for assisting with the lack of structural information and the flexibility of peptides. SepPPepIS replaces the peptide structural features in GraphPPepIS by learning features from peptide sequences. We performed a comprehensive evaluation of the benchmark data sets, and the results show that our two methods outperform state-of-the-art methods on the accurate interaction sites of both protein and peptide sides. We show that our methods can help improve protein-peptide docking. For docking data sets, our methods maintain robust performance in identifying binding sites, thereby enhancing the prediction of peptide binding poses. Finally, we visualized the analysis of protein and peptide graph embedding to demonstrate the learning ability of graph convolution in predicting interaction sites, which was mainly obtained through the shared parameters of a protein graph and peptide graph.

Assuntos

Benchmarking , Peptídeos , Sequência de Aminoácidos , Sítios de Ligação , Descoberta de Drogas

7.

DeepMPSF: A Deep Learning Network for Predicting General Protein Phosphorylation Sites Based on Multiple Protein Sequence Features.

Xie, Jingxin; Quan, Lijun; Wang, Xuejiao; Wu, Hongjie; Jin, Zhi; Pan, Deng; Chen, Taoning; Wu, Tingfang; Lyu, Qiang.

J Chem Inf Model ; 63(22): 7258-7271, 2023 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-37931253

RESUMO

Phosphorylation, as one of the most important post-translational modifications, plays a key role in various cellular physiological processes and disease occurrences. In recent years, computer technology has been gradually applied to the prediction of protein phosphorylation sites. However, most existing methods rely on simple protein sequence features that provide limited contextual information. To overcome this limitation, we propose DeepMPSF, a phosphorylation site prediction model based on multiple protein sequence features. There are two types of features: sequence semantic features, which comprise protein residue type information and relative position information within protein sequence, and protein background biophysical features, which include global semantic information containing more comprehensive protein background information obtained from pretrained models. To extract these features, DeepMPSF employs two separate subnetworks: the S71SFE module and the BBFE module, which automatically extract high-level semantic features. Our model incorporates a learning strategy for handling imbalanced datasets through ensemble learning during training and prediction. DeepMPSF is trained and evaluated on a well-established dataset of human proteins. Comparing the analysis with other benchmark methods reveals that DeepMPSF outperforms in predicting both S/T residues and Y residues. In particular, DeepMPSF showed excellent generalization performance in cross-species blind test performance, with an average improvement of 5.63%/5.72%, 22.28%/25.94%, 20.11%/17.49%, and 26.40%/28.33% for Mus musculus/Rattus norvegicus test sets in area under curves (AUCs) of ROC curve, AUC of the PR curve, F1-score, and MCC metrics, respectively. Furthermore, it also shows excellent performance in the latest updated case of natural proteins with functional phosphorylation sites. Through an ablation study and visual analysis, we uncover that the design of different feature modules significantly contributes to the accurate classification of DeepMPSF, which provides valuable insights for predicting phosphorylation sites and offers effective support for future downstream research.

Assuntos

Aprendizado Profundo , Camundongos , Animais , Humanos , Ratos , Fosforilação , Proteínas/química , Sequência de Aminoácidos , Processamento de Proteína Pós-Traducional

8.

Multisource Attention-Mechanism-Based Encoder-Decoder Model for Predicting Drug-Drug Interaction Events.

Pan, Deng; Quan, Lijun; Jin, Zhi; Chen, Taoning; Wang, Xuejiao; Xie, Jingxin; Wu, Tingfang; Lyu, Qiang.

J Chem Inf Model ; 62(23): 6258-6270, 2022 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-36449561

RESUMO

Many computational methods have been proposed to predict drug-drug interactions (DDIs), which can occur when combining drugs to treat various diseases, but most mainly utilize single-source features of drugs, which is inadequate for drug representation. To fill this gap, we propose two attention-mechanism-based encoder-decoder models that incorporate multisource information: one is MAEDDI, which can predict DDIs, and the other is MAEDDIE, which can make further DDI-associated event predictions for drug pairs with DDIs. To better express the drug feature, we used three encoding methods to encode the drugs, integrating the self-attention mechanism, cross-attention mechanism, and graph attention network to construct a multisource feature fusion network. Experiments showed that both MAEDDI and MAEDDIE performed better than some state-of-the-art methods in various validation attempts at different experimental tasks. The visualization analysis showed that the semantic features of drug pairs learned from our models had a good drug representation. In practice, MAEDDIE successfully screened 43 DDI events on favipiravir, an influenza antiviral drug, with a success rate of nearly 50%. Our model achieved competitive results, mainly owing to the design of sequence-based, structural, biochemical, and statistical multisource features. Moreover, different encoders constructed based on different features learn the interrelationship information between drug pairs, and the different representations of these drug pairs are incorporated to predict the target problem. All of these encoders were designed to better characterize the complex DDI relationships, allowing us to achieve high generalization in DDI and DDI-associated event predations.

Assuntos

Semântica , Interações Medicamentosas

9.

Cluster-Transition Determining Sites Underlying the Antigenic Evolution of Seasonal Influenza Viruses.

Quan, Lijun; Ji, Chengyang; Ding, Xiao; Peng, Yousong; Liu, Mi; Sun, Jiya; Jiang, Taijiao; Wu, Aiping.

Mol Biol Evol ; 36(6): 1172-1186, 2019 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-30851115

RESUMO

Seasonal influenza viruses undergo frequent mutations on their surface hemagglutinin (HA) proteins to escape the host immune response. In these mutations, a few key amino acid sites are associated with significant antigenic cluster transitions. To recognize the cluster-transition determining sites of seasonal influenza A/H3N2 and A/H1N1 viruses systematically and quickly, we developed a computational model named RECDS (recognition of cluster-transition determining sites) to evaluate the contribution of a specific amino acid site on the HA protein in the whole history of antigenic evolution. In RECDS, we ranked all of the HA sites by calculating the contribution scores derived from the forest of gradient boosting classifiers trained by various sequence- and structure-based features. With the RECDS model, we found out that the sites determining influenza antigenicity were mostly around the receptor-binding domain both for the influenza A/H3N2 and A/H1N1 viruses. Specifically, half of the cluster-transition determining sites of the influenza A/H1N1 virus were located in the vestigial esterase domain and basic path area on the HA, which indicated that the differential driving force of the antigenic evolution of the A/H1N1 virus refers to the A/H3N2 virus. Beyond that, the footprints of substitutions responsible for antigenic evolution were inferred according to the phylogenetic trees for the cluster-transition determining sites. The monitoring of genetic variation occurring at these cluster-transition determining sites in circulating influenza viruses on a large scale will potentially reduce current assay workloads in influenza surveillance and the selection of new influenza vaccine strains.

Assuntos

Antígenos Virais/genética , Evolução Molecular , Hemaglutininas/genética , Vírus da Influenza A Subtipo H1N1/genética , Vírus da Influenza A Subtipo H3N2/genética , Algoritmos , Técnicas Genéticas , Vírus da Influenza A Subtipo H1N1/imunologia , Vírus da Influenza A Subtipo H3N2/imunologia , Software

10.

STRUM: structure-based prediction of protein stability changes upon single-point mutation.

Quan, Lijun; Lv, Qiang; Zhang, Yang.

Bioinformatics ; 32(19): 2936-46, 2016 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-27318206

RESUMO

MOTIVATION: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. RESULTS: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. AVAILABILITY AND IMPLEMENTATION: http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Modelos Moleculares , Estabilidade Proteica , Algoritmos , Humanos , Mutação Puntual , Proteínas , Software , Relação Estrutura-Atividade

11.

Improved packing of protein side chains with parallel ant colonies.

Quan, Lijun; Lü, Qiang; Li, Haiou; Xia, Xiaoyan; Wu, Hongjie.

BMC Bioinformatics ; 15 Suppl 12: S5, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25474164

RESUMO

INTRODUCTION: The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. METHODS: We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. RESULTS: We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. CONCLUSIONS: This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.

Assuntos

Algoritmos , Conformação Proteica , Aminoácidos/química , Biologia Computacional/métodos , Modelos Moleculares , Proteínas/química , Análise de Sequência de Proteína

12.

MultiModRLBP: A Deep Learning Approach for Multi-Modal RNA-Small Molecule Ligand Binding Sites Prediction.

Wang, Junkai; Quan, Lijun; Jin, Zhi; Wu, Hongjie; Ma, Xuhao; Wang, Xuejiao; Xie, Jingxin; Pan, Deng; Chen, Taoning; Wu, Tingfang; Lyu, Qiang.

IEEE J Biomed Health Inform ; PP2024 May 13.

Artigo em Inglês | MEDLINE | ID: mdl-38739505

RESUMO

This study aims to tackle the intricate challenge of predicting RNA-small molecule binding sites to explore the potential value in the field of RNA drug targets. To address this challenge, we propose the MultiModRLBP method, which integrates multi-modal features using deep learning algorithms. These features include 3D structural properties at the nucleotide base level of the RNA molecule, relational graphs based on overall RNA structure, and rich RNA semantic information. In our investigation, we gathered 851 interactions between RNA and small molecule ligand from the RNAglib dataset and RLBind training set. Unlike conventional training sets, this collection broadened its scope by including RNA complexes that have the same RNA sequence but change their respective binding sites due to structural differences or the presence of different ligands. This enhancement enables the MultiModRLBP model to more accurately capture subtle changes at the structural level, ultimately improving its ability to discern nuances among similar RNA conformations. Furthermore, we evaluated MultiModRLBP on two classic test sets, Test18 and Test3, highlighting its performance disparities on small molecules based on metal and non-metal ions. Additionally, we conducted a structural sensitivity analysis on specific complex categories, considering RNA instances with varying degrees of structural changes and whether they share the same ligands. The research results indicate that MultiModRLBP outperforms the current state-of-the-art methods on multiple classic test sets, particularly excelling in predicting binding sites for non-metal ions and instances where the binding sites are widely distributed along the sequence. MultiModRLBP also can be used as a potential tool when the RNA structure is perturbed or the RNA experimental tertiary structure is not available. Most importantly, MultiModRLBP exhibits the capability to distinguish binding characteristics of RNA that are structurally diverse yet exhibit sequence similarity. These advancements hold promise in reducing the costs associated with the development of RNA-targeted drugs.

13.

How Deepbics Quantifies Intensities of Transcription Factor-DNA Binding and Facilitates Prediction of Single Nucleotide Variant Pathogenicity With a Deep Learning Model Trained On ChIP-Seq Data Sets.

Quan, Lijun; Chu, Xiaomin; Sun, Xiaoyu; Wu, Tingfang; Lyu, Qiang.

IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1594-1599, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-35471887

RESUMO

The binding of DNA sequences to cell type-specific transcription factors is essential for regulating gene expression in all organisms. Many variants occurring in these binding regions play crucial roles in human disease by disrupting the cis-regulation of gene expression. We first implemented a sequence-based deep learning model called deepBICS to quantify the intensity of transcription factors-DNA binding. The experimental results not only showed the superiority of deepBICS on ChIP-seq data sets but also suggested deepBICS as a language model could help the classification of disease-related and neutral variants. We then built a language model-based method called deepBICS4SNV to predict the pathogenicity of single nucleotide variants. The good performance of deepBICS4SNV on 2 tests related to Mendelian disorders and viral diseases shows the sequence contextual information derived from language models can improve prediction accuracy and generalization capability.

Assuntos

Sequenciamento de Cromatina por Imunoprecipitação , Aprendizado Profundo , Humanos , Virulência , Sítios de Ligação/genética , DNA/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Nucleotídeos

14.

ctP²ISP: Protein-Protein Interaction Sites Prediction Using Convolution and Transformer With Data Augmentation.

Li, Kailong; Quan, Lijun; Jiang, Yelu; Li, Yan; Zhou, Yiting; Wu, Tingfang; Lyu, Qiang.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 297-306, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-35213314

RESUMO

Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctP2ISP to improve the prediction of protein-protein interaction sites. ctP2ISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctP2ISP was evaluated against current state-of-the-art methods on six public datasets. The results show that ctP2ISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.

Assuntos

Redes Neurais de Computação , Software , Benchmarking

15.

TransRNAm: Identifying Twelve Types of RNA Modifications by an Interpretable Multi-Label Deep Learning Model Based on Transformer.

Chen, Taoning; Wu, Tingfang; Pan, Deng; Xie, Jinxing; Zhi, Jin; Wang, Xuejiao; Quan, Lijun; Lyu, Qiang.

IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3623-3634, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37607147

RESUMO

Accurate identification of RNA modification sites is of great significance in understanding the functions and regulatory mechanisms of RNAs. Recent advances have shown great promise in applying computational methods based on deep learning for accurate prediction of RNA modifications. However, those methods generally predicted only a single type of RNA modification. In addition, such methods suffered from the scarcity of the interpretability for their predicted results. In this work, a new Transformer-based deep learning method was proposed to predict multiple RNA modifications simultaneously, referred to as TransRNAm. More specifically, TransRNAm employs Transformer to extract contextual feature and convolutional neural networks to further learn high-latent feature representations of RNA sequences relevant for RNA modifications. Importantly, by integrating the self-attention mechanism in Transformer with convolutional neural network, TransRNAm is capable of not only capturing the critical nucleotide sites that contribute significantly to RNA modification prediction, but also revealing the underlying association among different types of RNA modifications. Consequently, this work provided an accurate and interpretable predictor for multiple RNA modification prediction, which may contribute to uncovering the sequence-based forming mechanism of RNA modification sites.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Nucleotídeos , RNA/genética

16.

DGCddG: Deep Graph Convolution for Predicting Protein-Protein Binding Affinity Changes Upon Mutations.

Jiang, Yelu; Quan, Lijun; Li, Kailong; Li, Yan; Zhou, Yiting; Wu, Tingfang; Lyu, Qiang.

IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2089-2100, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37018301

RESUMO

Effectively and accurately predicting the effects of interactions between proteins after amino acid mutations is a key issue for understanding the mechanism of protein function and drug design. In this study, we present a deep graph convolution (DGC) network-based framework, DGCddG, to predict the changes of protein-protein binding affinity after mutation. DGCddG incorporates multi-layer graph convolution to extract a deep, contextualized representation for each residue of the protein complex structure. The mined channels of the mutation sites by DGC is then fitted to the binding affinity with a multi-layer perceptron. Experiments with results on multiple datasets show that our model can achieve relatively good performance for both single and multi-point mutations. For blind tests on datasets related to angiotensin-converting enzyme 2 binding with the SARS-CoV-2 virus, our method shows better results in predicting ACE2 changes, may help in finding favorable antibodies. Code and data availability: https://github.com/lennylv/DGCddG.

Assuntos

COVID-19 , Humanos , Ligação Proteica/genética , COVID-19/genética , SARS-CoV-2/genética , Mutação/genética , Mutação Puntual

17.

Enhancement of sludge dewaterability by repeated inoculation of acidified sludge: Extracellular polymeric substances molecular structure and microbial community succession.

Li, Yunbei; Fu, Chunyan; Cao, Xinyu; Wang, Xin; Wang, Ninghao; Zheng, Mengyu; Quan, Lijun; Lv, Jinghua; Guo, Zhensheng.

Chemosphere ; 339: 139714, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37543234

RESUMO

Improving the dewatering performance of sewage sludge is of great scientific and engineering significance in the context of accelerated urbanization and increasingly strict environmental regulations. Acidified sludge (AS) can improve sludge dewatering performance, but the dewatering effect of repeated inoculation is unclear. The effects of long-term repeated inoculation of AS on the sludge dewaterability were investigated. The molecular structure and microbial community succession of extracellular polymeric substances (EPS) are emphasized. The results revealed that increasing the inoculation ratio of AS reduced the pH, absolute value of sludge zeta potential, and sludge particle size, and the decreasing trend was more evident with prolonging treatment time. Under the conditions of 30% and 50% AS inoculation, the dewatering performance of the sludge was significantly improved (p < 0.05). Compared with the raw sludge, the specific resistance of filtration (SRF) and capillary suction time of 30% inoculation were reduced by 64.3% and 50.1% after 30 cycles, respectively. Excluding loosely bound (LB)-EPS, soluble (S)-EPS and tightly bound (TB)-EPS exhibited a visible decrease, the protein in TB-EPS was significantly related to sludge dewaterability (p < 0.05). The fluorescent components of aromatic protein and fulvic acid-like substances in TB-EPS were significantly associated with SRF, with a correlation coefficient 0.99 (p < 0.05). Both the increase in the percentages of random coil and decrease in α-helix in TB-EPS contributed to improving dewaterability. Increasing Firmicutes and decreasing Chloroflexi levels improved the sludge dewatering capacity. Repeated inoculation did not disrupt the dewatering effect of AS rather increased the feasibility of the engineering application of AS. Considering the dewatering performance and cost synthetically, 30% AS inoculated ratio is feasible for practical applications.

Assuntos

Matriz Extracelular de Substâncias Poliméricas , Esgotos , Esgotos/química , Estrutura Molecular , Água/química , Proteínas/química , Eliminação de Resíduos Líquidos/métodos

18.

SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model.

Zhang, Yikang; Chu, Xiaomin; Jiang, Yelu; Wu, Hongjie; Quan, Lijun.

Genes (Basel) ; 13(4)2022 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-35456374

RESUMO

A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug-DNA interactions, but they can promote or inhibit the expression of the critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, the biological experimental techniques for measuring it are expensive and time-consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information provided by the bases in gene sequences. To address these issues, we proposed a new solution called SemanticCAP. It introduces a gene language model that models the context of gene sequences and is thus able to provide an effective representation of a certain site in a gene sequence. Basically, we merged the features provided by the gene language model into our chromatin accessibility model. During the process, we designed methods called SFA and SFC to make feature fusion smoother. Compared to DeepSEA, gkm-SVM, and k-mer using public benchmarks, our model proved to have better performance, showing a 1.25% maximum improvement in auROC and a 2.41% maximum improvement in auPRC.

Assuntos

Cromatina , Idioma , Cromatina/genética , Imunoprecipitação da Cromatina , DNA/genética , Fatores de Transcrição/genética

19.

DeepNup: Prediction of Nucleosome Positioning from DNA Sequences Using Deep Neural Network.

Zhou, Yiting; Wu, Tingfang; Jiang, Yelu; Li, Yan; Li, Kailong; Quan, Lijun; Lyu, Qiang.

Genes (Basel) ; 13(11)2022 10 30.

Artigo em Inglês | MEDLINE | ID: mdl-36360220

RESUMO

Nucleosome positioning is involved in diverse cellular biological processes by regulating the accessibility of DNA sequences to DNA-binding proteins and plays a vital role. Previous studies have manifested that the intrinsic preference of nucleosomes for DNA sequences may play a dominant role in nucleosome positioning. As a consequence, it is nontrivial to develop computational methods only based on DNA sequence information to accurately identify nucleosome positioning, and thus intend to verify the contribution of DNA sequences responsible for nucleosome positioning. In this work, we propose a new deep learning-based method, named DeepNup, which enables us to improve the prediction of nucleosome positioning only from DNA sequences. Specifically, we first use a hybrid feature encoding scheme that combines One-hot encoding and Trinucleotide composition encoding to encode raw DNA sequences; afterwards, we employ multiscale convolutional neural network modules that consist of two parallel convolution kernels with different sizes and gated recurrent units to effectively learn the local and global correlation feature representations; lastly, we use a fully connected layer and a sigmoid unit serving as a classifier to integrate these learned high-order feature representations and generate the final prediction outcomes. By comparing the experimental evaluation metrics on two benchmark nucleosome positioning datasets, DeepNup achieves a better performance for nucleosome positioning prediction than that of several state-of-the-art methods. These results demonstrate that DeepNup is a powerful deep learning-based tool that enables one to accurately identify potential nucleosome sequences.

Assuntos

Nucleossomos , Saccharomyces cerevisiae , Nucleossomos/genética , Nucleossomos/metabolismo , Sequência de Bases , Saccharomyces cerevisiae/genética , Montagem e Desmontagem da Cromatina , Redes Neurais de Computação

20.

Learning Useful Representations of DNA Sequences From ChIP-Seq Datasets for Exploring Transcription Factor Binding Specificities.

Quan, Lijun; Sun, Xiaoyu; Wu, Jian; Mei, Jie; Huang, Liqun; He, Ruji; Nie, Liangpeng; Chen, Yu; Lyu, Qiang.

IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 998-1008, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-32976105

RESUMO

Deep learning has been successfully applied to surprisingly different domains. Researchers and practitioners are employing trained deep learning models to enrich our knowledge. Transcription factors (TFs)are essential for regulating gene expression in all organisms by binding to specific DNA sequences. Here, we designed a deep learning model named SemanticCS (Semantic ChIP-seq)to predict TF binding specificities. We trained our learning model on an ensemble of ChIP-seq datasets (Multi-TF-cell)to learn useful intermediate features across multiple TFs and cells. To interpret these feature vectors, visualization analysis was used. Our results indicate that these learned representations can be used to train shallow machines for other tasks. Using diverse experimental data and evaluation metrics, we show that SemanticCS outperforms other popular methods. In addition, from experimental data, SemanticCS can help to identify the substitutions that cause regulatory abnormalities and to evaluate the effect of substitutions on the binding affinity for the RXR transcription factor. The online server for SemanticCS is freely available at http://qianglab.scst.suda.edu.cn/semanticCS/.

Assuntos

Sequenciamento de Cromatina por Imunoprecipitação , Fatores de Transcrição , Sequência de Bases , Sítios de Ligação/genética , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA