Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 829
Filtrar
1.
Annu Rev Cell Dev Biol ; 35: 357-379, 2019 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-31283382

RESUMO

Eukaryotic transcription factors (TFs) from the same structural family tend to bind similar DNA sequences, despite the ability of these TFs to execute distinct functions in vivo. The cell partly resolves this specificity paradox through combinatorial strategies and the use of low-affinity binding sites, which are better able to distinguish between similar TFs. However, because these sites have low affinity, it is challenging to understand how TFs recognize them in vivo. Here, we summarize recent findings and technological advancements that allow for the quantification and mechanistic interpretation of TF recognition across a wide range of affinities. We propose a model that integrates insights from the fields of genetics and cell biology to provide further conceptual understanding of TF binding specificity. We argue that in eukaryotes, target specificity is driven by an inhomogeneous 3D nuclear distribution of TFs and by variation in DNA binding affinity such that locally elevated TF concentration allows low-affinity binding sites to be functional.


Assuntos
Eucariotos/metabolismo , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Regulação da Expressão Gênica , Humanos
2.
Annu Rev Biochem ; 86: 567-583, 2017 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-28654325

RESUMO

Multidrug resistance is a global threat as the clinically available potent antibiotic drugs are becoming exceedingly scarce. For example, increasing drug resistance among gram-positive bacteria is responsible for approximately one-third of nosocomial infections. As ribosomes are a major target for these drugs, they may serve as suitable objects for novel development of next-generation antibiotics. Three-dimensional structures of ribosomal particles from Staphylococcus aureus obtained by X-ray crystallography have shed light on fine details of drug binding sites and have revealed unique structural motifs specific for this pathogenic strain, which may be used for the design of novel degradable pathogen-specific, and hence, environmentally friendly drugs.


Assuntos
Antibacterianos/síntese química , Proteínas de Bactérias/química , Desenho de Fármacos , Ribossomos/efeitos dos fármacos , Staphylococcus aureus/efeitos dos fármacos , Antibacterianos/metabolismo , Antibacterianos/farmacologia , Proteínas de Bactérias/antagonistas & inibidores , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sítios de Ligação , Infecção Hospitalar/tratamento farmacológico , Infecção Hospitalar/microbiologia , Cristalografia por Raios X , Deinococcus/efeitos dos fármacos , Deinococcus/genética , Deinococcus/metabolismo , Farmacorresistência Bacteriana Múltipla , Escherichia coli/efeitos dos fármacos , Escherichia coli/genética , Escherichia coli/metabolismo , Expressão Gênica , Humanos , Modelos Moleculares , Ribossomos/metabolismo , Ribossomos/ultraestrutura , Infecções Estafilocócicas/tratamento farmacológico , Infecções Estafilocócicas/microbiologia , Staphylococcus aureus/genética , Staphylococcus aureus/metabolismo , Thermus thermophilus/efeitos dos fármacos , Thermus thermophilus/genética , Thermus thermophilus/metabolismo
3.
Mol Cell ; 83(12): 1970-1982.e6, 2023 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-37327775

RESUMO

Pioneer transcription factors are essential for cell fate changes by targeting closed chromatin. OCT4 is a crucial pioneer factor that can induce cell reprogramming. However, the structural basis of how pioneer factors recognize the in vivo nucleosomal DNA targets is unknown. Here, we determine the high-resolution structures of the nucleosome containing human LIN28B DNA and its complexes with the OCT4 DNA binding region. Three OCT4s bind the pre-positioned nucleosome by recognizing non-canonical DNA sequences. Two use their POUS domains while the other uses the POUS-loop-POUHD region; POUHD serves as a wedge to unwrap ∼25 base pair DNA. Our analysis of previous genomic data and determination of the ESRRB-nucleosome-OCT4 structure confirmed the generality of these structural features. Moreover, biochemical studies suggest that multiple OCT4s cooperatively open the H1-condensed nucleosome array containing the LIN28B nucleosome. Thus, our study suggests a mechanism of how OCT4 can target the nucleosome and open closed chromatin.


Assuntos
Cromatina , Nucleossomos , Fator 3 de Transcrição de Octâmero , Proteínas de Ligação a RNA , Humanos , Sequência de Bases , Reprogramação Celular , Cromatina/genética , DNA/metabolismo , Nucleossomos/genética , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Fator 3 de Transcrição de Octâmero/genética , Fator 3 de Transcrição de Octâmero/metabolismo
4.
Mol Cell ; 80(3): 470-484.e8, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33053322

RESUMO

Cellular responses to environmental stress are frequently mediated by RNA-binding proteins (RBPs). Here, we examined global RBP dynamics in Saccharomyces cerevisiae in response to glucose starvation and heat shock. Each stress induced rapid remodeling of the RNA-protein interactome without corresponding changes in RBP abundance. Consistent with general translation shutdown, ribosomal proteins contacting the mRNA showed decreased RNA association. Among translation components, RNA association was most reduced for initiation factors involved in 40S scanning (eukaryotic initiation factor 4A [eIF4A], eIF4B, and Ded1), indicating a common mechanism of translational repression. In unstressed cells, eIF4A, eIF4B, and Ded1 primarily targeted the 5' ends of mRNAs. Following glucose withdrawal, 5' binding was abolished within 30 s, explaining the rapid translation shutdown, but mRNAs remained stable. Heat shock induced progressive loss of 5' RNA binding by initiation factors over ∼16 min and provoked mRNA degradation, particularly for translation-related factors, mediated by Xrn1. Taken together, these results reveal mechanisms underlying translational control of gene expression during stress.


Assuntos
Fatores de Iniciação de Peptídeos/metabolismo , Biossíntese de Proteínas/fisiologia , Estresse Fisiológico/fisiologia , Regiões 5' não Traduzidas , RNA Helicases DEAD-box/metabolismo , Fator de Iniciação 4A em Eucariotos/metabolismo , Fator de Iniciação Eucariótico 4G/metabolismo , Fatores de Iniciação em Eucariotos/metabolismo , Glucose/metabolismo , Resposta ao Choque Térmico/fisiologia , Fatores de Iniciação de Peptídeos/fisiologia , RNA Mensageiro/genética , Proteínas de Ligação a RNA/metabolismo , Proteínas Ribossômicas/metabolismo , Proteínas Ribossômicas/fisiologia , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
5.
Mol Cell ; 74(2): 245-253.e6, 2019 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-30826165

RESUMO

Transcription factors (TFs) control gene expression by binding DNA recognition sites in genomic regulatory regions. Although most forkhead TFs recognize a canonical forkhead (FKH) motif, RYAAAYA, some forkheads recognize a completely different (FHL) motif, GACGC. Bispecific forkhead proteins recognize both motifs, but the molecular basis for bispecific DNA recognition is not understood. We present co-crystal structures of the FoxN3 DNA binding domain bound to the FKH and FHL sites, respectively. FoxN3 adopts a similar conformation to recognize both motifs, making contacts with different DNA bases using the same amino acids. However, the DNA structure is different in the two complexes. These structures reveal how a single TF binds two unrelated DNA sequences and the importance of DNA shape in the mechanism of bispecific recognition.


Assuntos
Proteínas de Ciclo Celular/química , Proteínas de Ligação a DNA/química , DNA/química , Conformação de Ácido Nucleico , Proteínas Repressoras/química , Sequência de Aminoácidos/genética , Sequência de Bases/genética , Sítios de Ligação/genética , Proteínas de Ciclo Celular/genética , Cristalografia por Raios X , DNA/genética , Proteínas de Ligação a DNA/genética , Fatores de Transcrição Forkhead , Regulação da Expressão Gênica/genética , Humanos , Complexos Multiproteicos/química , Complexos Multiproteicos/genética , Motivos de Nucleotídeos/genética , Sequências Reguladoras de Ácido Nucleico/genética , Proteínas Repressoras/genética
6.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38990514

RESUMO

Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.


Assuntos
Peptídeos , Sítios de Ligação , Peptídeos/química , Peptídeos/metabolismo , Ligação Proteica , Biologia Computacional/métodos , Algoritmos , Proteínas/química , Proteínas/metabolismo , Aprendizado de Máquina
7.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38701417

RESUMO

Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BERT)-based model, called BERT-TFBS, to predict TFBSs solely based on DNA sequences. The model consists of a pre-trained BERT module (DNABERT-2), a convolutional neural network (CNN) module, a convolutional block attention module (CBAM) and an output module. The BERT-TFBS model utilizes the pre-trained DNABERT-2 module to acquire the complex long-term dependencies in DNA sequences through a transfer learning approach, and applies the CNN module and the CBAM to extract high-order local features. The proposed model is trained and tested based on 165 ENCODE ChIP-seq datasets. We conducted experiments with model variants, cross-cell-line validations and comparisons with other models. The experimental results demonstrate the effectiveness and generalization capability of BERT-TFBS in predicting TFBSs, and they show that the proposed model outperforms other deep-learning models. The source code for BERT-TFBS is available at https://github.com/ZX1998-12/BERT-TFBS.


Assuntos
Redes Neurais de Computação , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Sítios de Ligação , Algoritmos , Biologia Computacional/métodos , Humanos , Aprendizado Profundo , Ligação Proteica
8.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39101501

RESUMO

Engineering enzyme-substrate binding pockets is the most efficient approach for modifying catalytic activity, but is limited if the substrate binding sites are indistinct. Here, we developed a 3D convolutional neural network for predicting protein-ligand binding sites. The network was integrated by DenseNet, UNet, and self-attention for extracting features and recovering sample size. We attempted to enlarge the dataset by data augmentation, and the model achieved success rates of 48.4%, 35.5%, and 43.6% at a precision of ≥50% and 52%, 47.6%, and 58.1%. The distance of predicted and real center is ≤4 Å, which is based on SC6K, COACH420, and BU48 validation datasets. The substrate binding sites of Klebsiella variicola acid phosphatase (KvAP) and Bacillus anthracis proline 4-hydroxylase (BaP4H) were predicted using DUnet, showing high competitive performance of 53.8% and 56% of the predicted binding sites that critically affected the catalysis of KvAP and BaP4H. Virtual saturation mutagenesis was applied based on the predicted binding sites of KvAP, and the top-ranked 10 single mutations contributed to stronger enzyme-substrate binding varied while the predicted sites were different. The advantage of DUnet for predicting key residues responsible for enzyme activity further promoted the success rate of virtual mutagenesis. This study highlighted the significance of correctly predicting key binding sites for enzyme engineering.


Assuntos
Aprendizado de Máquina , Sítios de Ligação , Engenharia de Proteínas/métodos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Fosfatase Ácida/química , Fosfatase Ácida/genética , Fosfatase Ácida/metabolismo , Especificidade por Substrato , Bacillus anthracis/genética , Bacillus anthracis/enzimologia , Klebsiella/genética , Klebsiella/enzimologia , Ligantes , Ligação Proteica , Modelos Moleculares , Redes Neurais de Computação
9.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39350338

RESUMO

Accurate prediction of transcription factor binding sites (TFBSs) is essential for understanding gene regulation mechanisms and the etiology of diseases. Despite numerous advances in deep learning for predicting TFBSs, their performance can still be enhanced. In this study, we propose MLSNet, a novel deep learning architecture designed specifically to predict TFBSs. MLSNet innovatively integrates multisize convolutional fusion with long short-term memory (LSTM) networks to effectively capture DNA-sparse higher-order sequence features. Further, MLSNet incorporates super token attention and Bi-LSTM to systematically extract and integrate higher-order DNA shape features. Experimental results on 165 ChIP-seq (chromatin immunoprecipitation followed by sequencing) datasets indicate that MLSNet consistently outperforms several state-of-the-art algorithms in the prediction of TFBSs. Specifically, MLSNet reports average metrics: 0.8306 for ACC, 0.8992 for AUROC, and 0.9035 for AUPRC, surpassing the second-best methods by 1.82%, 1.68%, and 1.54%, respectively. This research delineates the effectiveness of combining multi-size convolutional layers with LSTM and DNA shape-based features in enhancing predictive accuracy. Moreover, this study comprehensively assesses the variability in model performance across different cell lines and transcription factors. The source code of MLSNet is available at https://github.com/minghaidea/MLSNet.


Assuntos
Aprendizado Profundo , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Algoritmos , Biologia Computacional/métodos , Humanos , Sequenciamento de Cromatina por Imunoprecipitação/métodos , DNA/metabolismo , DNA/química
10.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38149460

RESUMO

Evolution of gene expression mediated by cis-regulatory changes is thought to be an important contributor to organismal adaptation, but identifying adaptive cis-regulatory changes is challenging due to the difficulty in knowing the expectation under no positive selection. A new approach for detecting positive selection on transcription factor binding sites (TFBSs) was recently developed, thanks to the application of machine learning in predicting transcription factor (TF) binding affinities of DNA sequences. Given a TFBS sequence from a focal species and the corresponding inferred ancestral sequence that differs from the former at n sites, one can predict the TF-binding affinities of many n-step mutational neighbors of the ancestral sequence and obtain a null distribution of the derived binding affinity, which allows testing whether the binding affinity of the real derived sequence deviates significantly from the null distribution. Applying this test genomically to all experimentally identified binding sites of 3 TFs in humans, a recent study reported positive selection for elevated binding affinities of TFBSs. Here, we show that this genomic test suffers from an ascertainment bias because, even in the absence of positive selection for strengthened binding, the binding affinities of known human TFBSs are more likely to have increased than decreased in evolution. We demonstrate by computer simulation that this bias inflates the false positive rate of the selection test. We propose several methods to mitigate the ascertainment bias and show that almost all previously reported positive selection signals disappear when these methods are applied.


Assuntos
Genômica , Fatores de Transcrição , Humanos , Fatores de Transcrição/metabolismo , Simulação por Computador , Sítios de Ligação/genética , Ligação Proteica
11.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38174583

RESUMO

Bioluminescence in beetles has long fascinated biologists, with diverse applications in biotechnology. To date, however, our understanding of its evolutionary origin and functional variation mechanisms remains poor. To address these questions, we obtained high-quality reference genomes of luminous and nonluminous beetles in 6 Elateroidea families. We then reconstructed a robust phylogenetic relationship for all luminous families and related nonluminous families. Comparative genomic analyses and biochemical functional experiments suggested that gene evolution within Elateroidea played a crucial role in the origin of bioluminescence, with multiple parallel origins observed in the luminous beetle families. While most luciferase-like proteins exhibited a conserved nonluminous amino acid pattern (TLA346 to 348) in the luciferin-binding sites, luciferases in the different luminous beetle families showed divergent luminous patterns at these sites (TSA/CCA/CSA/LVA). Comparisons of the structural and enzymatic properties of ancestral, extant, and site-directed mutant luciferases further reinforced the important role of these sites in the trade-off between acyl-CoA synthetase and luciferase activities. Furthermore, the evolution of bioluminescent color demonstrated a tendency toward hypsochromic shifts and variations among the luminous families. Taken together, our results revealed multiple parallel origins of bioluminescence and functional divergence within the beetle bioluminescent system.


Assuntos
Besouros , Animais , Humanos , Besouros/genética , Filogenia , Sequência de Aminoácidos , Luciferases/genética , Luciferases/química , Luciferases/metabolismo , Sítios de Ligação
12.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37824738

RESUMO

The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.


Assuntos
Redes Neurais de Computação , Proteínas , Sítios de Ligação , Proteínas/química , RNA/metabolismo , DNA , Idioma
13.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36748992

RESUMO

Interactions between DNA and transcription factors (TFs) play an essential role in understanding transcriptional regulation mechanisms and gene expression. Due to the large accumulation of training data and low expense, deep learning methods have shown huge potential in determining the specificity of TFs-DNA interactions. Convolutional network-based and self-attention network-based methods have been proposed for transcription factor binding sites (TFBSs) prediction. Convolutional operations are efficient to extract local features but easy to ignore global information, while self-attention mechanisms are expert in capturing long-distance dependencies but difficult to pay attention to local feature details. To discover comprehensive features for a given sequence as far as possible, we propose a Dual-branch model combining Self-Attention and Convolution, dubbed as DSAC, which fuses local features and global representations in an interactive way. In terms of features, convolution and self-attention contribute to feature extraction collaboratively, enhancing the representation learning. In terms of structure, a lightweight but efficient architecture of network is designed for the prediction, in particular, the dual-branch structure makes the convolution and the self-attention mechanism can be fully utilized to improve the predictive ability of our model. The experiment results on 165 ChIP-seq datasets show that DSAC obviously outperforms other five deep learning based methods and demonstrate that our model can effectively predict TFBSs based on sequence feature alone. The source code of DSAC is available at https://github.com/YuBinLab-QUST/DSAC/.


Assuntos
DNA , Redes Neurais de Computação , Ligação Proteica , Sítios de Ligação , Fatores de Transcrição/genética
14.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38113077

RESUMO

The coronavirus disease 2019 (COVID-19) pandemic has spurred a wide range of approaches to control and combat the disease. However, selecting an effective antiviral drug target remains a time-consuming challenge. Computational methods offer a promising solution by efficiently reducing the number of candidates. In this study, we propose a structure- and deep learning-based approach that identifies vulnerable regions in viral proteins corresponding to drug binding sites. Our approach takes into account the protein dynamics, accessibility and mutability of the binding site and the putative mechanism of action of the drug. We applied this technique to validate drug targeting toward severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein S. Our findings reveal a conformation- and oligomer-specific glycan-free binding site proximal to the receptor binding domain. This site comprises topologically important amino acid residues. Molecular dynamics simulations of Spike in complex with candidate drug molecules bound to the potential binding sites indicate an equilibrium shifted toward the inactive conformation compared with drug-free simulations. Small molecules targeting this binding site have the potential to prevent the closed-to-open conformational transition of Spike, thereby allosterically inhibiting its interaction with human angiotensin-converting enzyme 2 receptor. Using a pseudotyped virus-based assay with a SARS-CoV-2 neutralizing antibody, we identified a set of hit compounds that exhibited inhibition at micromolar concentrations.


Assuntos
COVID-19 , Aprendizado Profundo , Humanos , Ligação Proteica , Sítios de Ligação , SARS-CoV-2/metabolismo , Simulação de Dinâmica Molecular , Anticorpos Antivirais , Glicoproteína da Espícula de Coronavírus/metabolismo
15.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36398911

RESUMO

Identification of RNA-small molecule binding sites plays an essential role in RNA-targeted drug discovery and development. These small molecules are expected to be leading compounds to guide the development of new types of RNA-targeted therapeutics compared with regular therapeutics targeting proteins. RNAs can provide many potential drug targets with diverse structures and functions. However, up to now, only a few methods have been proposed. Predicting RNA-small molecule binding sites still remains a big challenge. New computational model is required to better extract the features and predict RNA-small molecule binding sites more accurately. In this paper, a deep learning model, RLBind, was proposed to predict RNA-small molecule binding sites from sequence-dependent and structure-dependent properties by combining global RNA sequence channel and local neighbor nucleotides channel. To our best knowledge, this research was the first to develop a convolutional neural network for RNA-small molecule binding sites prediction. Furthermore, RLBind also can be used as a potential tool when the RNA experimental tertiary structure is not available. The experimental results show that RLBind outperforms other state-of-the-art methods in predicting binding sites. Therefore, our study demonstrates that the combination of global information for full-length sequences and local information for limited local neighbor nucleotides in RNAs can improve the model's predictive performance for binding sites prediction. All datasets and resource codes are available at https://github.com/KailiWang1/RLBind.


Assuntos
Aprendizado Profundo , RNA , RNA/metabolismo , Algoritmos , Ligação Proteica , Ligantes , Sítios de Ligação
16.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37096593

RESUMO

While research into drug-target interaction (DTI) prediction is fairly mature, generalizability and interpretability are not always addressed in the existing works in this field. In this paper, we propose a deep learning (DL)-based framework, called BindingSite-AugmentedDTA, which improves drug-target affinity (DTA) predictions by reducing the search space of potential-binding sites of the protein, thus making the binding affinity prediction more efficient and accurate. Our BindingSite-AugmentedDTA is highly generalizable as it can be integrated with any DL-based regression model, while it significantly improves their prediction performance. Also, unlike many existing models, our model is highly interpretable due to its architecture and self-attention mechanism, which can provide a deeper understanding of its underlying prediction mechanism by mapping attention weights back to protein-binding sites. The computational results confirm that our framework can enhance the prediction performance of seven state-of-the-art DTA prediction algorithms in terms of four widely used evaluation metrics, including concordance index, mean squared error, modified squared correlation coefficient ($r^2_m$) and the area under the precision curve. We also contribute to three benchmark drug-traget interaction datasets by including additional information on 3D structure of all proteins contained in those datasets, which include the two most commonly used datasets, namely Kiba and Davis, as well as the data from IDG-DREAM drug-kinase binding prediction challenge. Furthermore, we experimentally validate the practical potential of our proposed framework through in-lab experiments. The relatively high agreement between computationally predicted and experimentally observed binding interactions supports the potential of our framework as the next-generation pipeline for prediction models in drug repurposing.


Assuntos
Algoritmos , Reposicionamento de Medicamentos , Desenvolvimento de Medicamentos , Proteínas/química , Sítios de Ligação
17.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37328639

RESUMO

Precise targeting of transcription factor binding sites (TFBSs) is essential to comprehending transcriptional regulatory processes and investigating cellular function. Although several deep learning algorithms have been created to predict TFBSs, the models' intrinsic mechanisms and prediction results are difficult to explain. There is still room for improvement in prediction performance. We present DeepSTF, a unique deep-learning architecture for predicting TFBSs by integrating DNA sequence and shape profiles. We use the improved transformer encoder structure for the first time in the TFBSs prediction approach. DeepSTF extracts DNA higher-order sequence features using stacked convolutional neural networks (CNNs), whereas rich DNA shape profiles are extracted by combining improved transformer encoder structure and bidirectional long short-term memory (Bi-LSTM), and, finally, the derived higher-order sequence features and representative shape profiles are integrated into the channel dimension to achieve accurate TFBSs prediction. Experiments on 165 ENCODE chromatin immunoprecipitation sequencing (ChIP-seq) datasets show that DeepSTF considerably outperforms several state-of-the-art algorithms in predicting TFBSs, and we explain the usefulness of the transformer encoder structure and the combined strategy using sequence features and shape profiles in capturing multiple dependencies and learning essential features. In addition, this paper examines the significance of DNA shape features predicting TFBSs. The source code of DeepSTF is available at https://github.com/YuBinLab-QUST/DeepSTF/.


Assuntos
DNA , Redes Neurais de Computação , Sítios de Ligação , Ligação Proteica , DNA/genética , DNA/química , Fatores de Transcrição/genética , Fatores de Transcrição/química
18.
Methods ; 231: 26-36, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39270885

RESUMO

Interactions of biological molecules in organisms are considered to be primary factors for the lifecycle of that organism. Various important biological functions are dependent on such interactions and among different kinds of interactions, the protein DNA interactions are very important for the processes of transcription, regulation of gene expression, DNA repairing and packaging. Thus, keeping the knowledge of such interactions and the sites of those interactions is necessary to study the mechanism of various biological processes. As experimental identification through biological assays is quite resource-demanding, costly and error-prone, scientists opt for the computational methods for efficient and accurate identification of such DNA-protein interaction sites. Thus, herein, we propose a novel and accurate method namely DeepDBS for the identification of DNA-binding sites in proteins, using primary amino acid sequences of proteins under study. From protein sequences, deep representations were computed through a one-dimensional convolution neural network (1D-CNN), recurrent neural network (RNN) and long short-term memory (LSTM) network and were further used to train a Random Forest classifier. Random Forest with LSTM-based features outperformed the other models, as well as the existing state-of-the-art methods with an accuracy score of 0.99 for self-consistency test, 10-fold cross-validation, 5-fold cross-validation, and jackknife validation while 0.92 for independent dataset testing. It is concluded based on results that the DeepDBS can help accurate and efficient identification of DNA binding sites (DBS) in proteins.

19.
BMC Bioinformatics ; 25(1): 156, 2024 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-38641811

RESUMO

BACKGROUND: Accurately identifying drug-target interaction (DTI), affinity (DTA), and binding sites (DTS) is crucial for drug screening, repositioning, and design, as well as for understanding the functions of target. Although there are a few online platforms based on deep learning for drug-target interaction, affinity, and binding sites identification, there is currently no integrated online platforms for all three aspects. RESULTS: Our solution, the novel integrated online platform Drug-Online, has been developed to facilitate drug screening, target identification, and understanding the functions of target in a progressive manner of "interaction-affinity-binding sites". Drug-Online platform consists of three parts: the first part uses the drug-target interaction identification method MGraphDTA, based on graph neural networks (GNN) and convolutional neural networks (CNN), to identify whether there is a drug-target interaction. If an interaction is identified, the second part employs the drug-target affinity identification method MMDTA, also based on GNN and CNN, to calculate the strength of drug-target interaction, i.e., affinity. Finally, the third part identifies drug-target binding sites, i.e., pockets. The method pt-lm-gnn used in this part is also based on GNN. CONCLUSIONS: Drug-Online is a reliable online platform that integrates drug-target interaction, affinity, and binding sites identification. It is freely available via the Internet at http://39.106.7.26:8000/Drug-Online/ .


Assuntos
Aprendizado Profundo , Interações Medicamentosas , Sítios de Ligação , Sistemas de Liberação de Medicamentos , Avaliação Pré-Clínica de Medicamentos
20.
BMC Bioinformatics ; 25(1): 122, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38515052

RESUMO

BACKGROUND: Nanobodies, also known as VHH or single-domain antibodies, are unique antibody fragments derived solely from heavy chains. They offer advantages of small molecules and conventional antibodies, making them promising therapeutics. The paratope is the specific region on an antibody that binds to an antigen. Paratope prediction involves the identification and characterization of the antigen-binding site on an antibody. This process is crucial for understanding the specificity and affinity of antibody-antigen interactions. Various computational methods and experimental approaches have been developed to predict and analyze paratopes, contributing to advancements in antibody engineering, drug development, and immunotherapy. However, existing predictive models trained on traditional antibodies may not be suitable for nanobodies. Additionally, the limited availability of nanobody datasets poses challenges in constructing accurate models. METHODS: To address these challenges, we have developed a novel nanobody prediction model, named NanoBERTa-ASP (Antibody Specificity Prediction), which is specifically designed for predicting nanobody-antigen binding sites. The model adopts a training strategy more suitable for nanobodies, based on an advanced natural language processing (NLP) model called BERT (Bidirectional Encoder Representations from Transformers). To be more specific, the model utilizes a masked language modeling approach named RoBERTa (Robustly Optimized BERT Pretraining Approach) to learn the contextual information of the nanobody sequence and predict its binding site. RESULTS: NanoBERTa-ASP achieved exceptional performance in predicting nanobody binding sites, outperforming existing methods, indicating its proficiency in capturing sequence information specific to nanobodies and accurately identifying their binding sites. Furthermore, NanoBERTa-ASP provides insights into the interaction mechanisms between nanobodies and antigens, contributing to a better understanding of nanobodies and facilitating the design and development of nanobodies with therapeutic potential. CONCLUSION: NanoBERTa-ASP represents a significant advancement in nanobody paratope prediction. Its superior performance highlights the potential of deep learning approaches in nanobody research. By leveraging the increasing volume of nanobody data, NanoBERTa-ASP can further refine its predictions, enhance its performance, and contribute to the development of novel nanobody-based therapeutics. Github repository: https://github.com/WangLabforComputationalBiology/NanoBERTa-ASP.


Assuntos
Anticorpos de Domínio Único , Sítios de Ligação de Anticorpos , Anticorpos de Domínio Único/química , Anticorpos , Sítios de Ligação , Especificidade de Anticorpos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa