Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 798
Filtrar
Mais filtros

País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Annu Rev Cell Dev Biol ; 35: 357-379, 2019 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-31283382

RESUMO

Eukaryotic transcription factors (TFs) from the same structural family tend to bind similar DNA sequences, despite the ability of these TFs to execute distinct functions in vivo. The cell partly resolves this specificity paradox through combinatorial strategies and the use of low-affinity binding sites, which are better able to distinguish between similar TFs. However, because these sites have low affinity, it is challenging to understand how TFs recognize them in vivo. Here, we summarize recent findings and technological advancements that allow for the quantification and mechanistic interpretation of TF recognition across a wide range of affinities. We propose a model that integrates insights from the fields of genetics and cell biology to provide further conceptual understanding of TF binding specificity. We argue that in eukaryotes, target specificity is driven by an inhomogeneous 3D nuclear distribution of TFs and by variation in DNA binding affinity such that locally elevated TF concentration allows low-affinity binding sites to be functional.


Assuntos
Eucariotos/metabolismo , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Regulação da Expressão Gênica , Humanos
2.
Annu Rev Biochem ; 86: 567-583, 2017 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-28654325

RESUMO

Multidrug resistance is a global threat as the clinically available potent antibiotic drugs are becoming exceedingly scarce. For example, increasing drug resistance among gram-positive bacteria is responsible for approximately one-third of nosocomial infections. As ribosomes are a major target for these drugs, they may serve as suitable objects for novel development of next-generation antibiotics. Three-dimensional structures of ribosomal particles from Staphylococcus aureus obtained by X-ray crystallography have shed light on fine details of drug binding sites and have revealed unique structural motifs specific for this pathogenic strain, which may be used for the design of novel degradable pathogen-specific, and hence, environmentally friendly drugs.


Assuntos
Antibacterianos/síntese química , Proteínas de Bactérias/química , Desenho de Fármacos , Ribossomos/efeitos dos fármacos , Staphylococcus aureus/efeitos dos fármacos , Antibacterianos/metabolismo , Antibacterianos/farmacologia , Proteínas de Bactérias/antagonistas & inibidores , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sítios de Ligação , Infecção Hospitalar/tratamento farmacológico , Infecção Hospitalar/microbiologia , Cristalografia por Raios X , Deinococcus/efeitos dos fármacos , Deinococcus/genética , Deinococcus/metabolismo , Farmacorresistência Bacteriana Múltipla , Escherichia coli/efeitos dos fármacos , Escherichia coli/genética , Escherichia coli/metabolismo , Expressão Gênica , Humanos , Modelos Moleculares , Ribossomos/metabolismo , Ribossomos/ultraestrutura , Infecções Estafilocócicas/tratamento farmacológico , Infecções Estafilocócicas/microbiologia , Staphylococcus aureus/genética , Staphylococcus aureus/metabolismo , Thermus thermophilus/efeitos dos fármacos , Thermus thermophilus/genética , Thermus thermophilus/metabolismo
3.
Mol Cell ; 83(12): 1970-1982.e6, 2023 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-37327775

RESUMO

Pioneer transcription factors are essential for cell fate changes by targeting closed chromatin. OCT4 is a crucial pioneer factor that can induce cell reprogramming. However, the structural basis of how pioneer factors recognize the in vivo nucleosomal DNA targets is unknown. Here, we determine the high-resolution structures of the nucleosome containing human LIN28B DNA and its complexes with the OCT4 DNA binding region. Three OCT4s bind the pre-positioned nucleosome by recognizing non-canonical DNA sequences. Two use their POUS domains while the other uses the POUS-loop-POUHD region; POUHD serves as a wedge to unwrap ∼25 base pair DNA. Our analysis of previous genomic data and determination of the ESRRB-nucleosome-OCT4 structure confirmed the generality of these structural features. Moreover, biochemical studies suggest that multiple OCT4s cooperatively open the H1-condensed nucleosome array containing the LIN28B nucleosome. Thus, our study suggests a mechanism of how OCT4 can target the nucleosome and open closed chromatin.


Assuntos
Cromatina , Nucleossomos , Fator 3 de Transcrição de Octâmero , Proteínas de Ligação a RNA , Humanos , Sequência de Bases , Reprogramação Celular , Cromatina/genética , DNA/metabolismo , Nucleossomos/genética , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Fator 3 de Transcrição de Octâmero/genética , Fator 3 de Transcrição de Octâmero/metabolismo
4.
Mol Cell ; 80(3): 470-484.e8, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33053322

RESUMO

Cellular responses to environmental stress are frequently mediated by RNA-binding proteins (RBPs). Here, we examined global RBP dynamics in Saccharomyces cerevisiae in response to glucose starvation and heat shock. Each stress induced rapid remodeling of the RNA-protein interactome without corresponding changes in RBP abundance. Consistent with general translation shutdown, ribosomal proteins contacting the mRNA showed decreased RNA association. Among translation components, RNA association was most reduced for initiation factors involved in 40S scanning (eukaryotic initiation factor 4A [eIF4A], eIF4B, and Ded1), indicating a common mechanism of translational repression. In unstressed cells, eIF4A, eIF4B, and Ded1 primarily targeted the 5' ends of mRNAs. Following glucose withdrawal, 5' binding was abolished within 30 s, explaining the rapid translation shutdown, but mRNAs remained stable. Heat shock induced progressive loss of 5' RNA binding by initiation factors over ∼16 min and provoked mRNA degradation, particularly for translation-related factors, mediated by Xrn1. Taken together, these results reveal mechanisms underlying translational control of gene expression during stress.


Assuntos
Fatores de Iniciação de Peptídeos/metabolismo , Biossíntese de Proteínas/fisiologia , Estresse Fisiológico/fisiologia , Regiões 5' não Traduzidas , RNA Helicases DEAD-box/metabolismo , Fator de Iniciação 4A em Eucariotos/metabolismo , Fator de Iniciação Eucariótico 4G/metabolismo , Fatores de Iniciação em Eucariotos/metabolismo , Glucose/metabolismo , Resposta ao Choque Térmico/fisiologia , Fatores de Iniciação de Peptídeos/fisiologia , RNA Mensageiro/genética , Proteínas de Ligação a RNA/metabolismo , Proteínas Ribossômicas/metabolismo , Proteínas Ribossômicas/fisiologia , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
5.
Mol Cell ; 74(2): 245-253.e6, 2019 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-30826165

RESUMO

Transcription factors (TFs) control gene expression by binding DNA recognition sites in genomic regulatory regions. Although most forkhead TFs recognize a canonical forkhead (FKH) motif, RYAAAYA, some forkheads recognize a completely different (FHL) motif, GACGC. Bispecific forkhead proteins recognize both motifs, but the molecular basis for bispecific DNA recognition is not understood. We present co-crystal structures of the FoxN3 DNA binding domain bound to the FKH and FHL sites, respectively. FoxN3 adopts a similar conformation to recognize both motifs, making contacts with different DNA bases using the same amino acids. However, the DNA structure is different in the two complexes. These structures reveal how a single TF binds two unrelated DNA sequences and the importance of DNA shape in the mechanism of bispecific recognition.


Assuntos
Proteínas de Ciclo Celular/química , Proteínas de Ligação a DNA/química , DNA/química , Conformação de Ácido Nucleico , Proteínas Repressoras/química , Sequência de Aminoácidos/genética , Sequência de Bases/genética , Sítios de Ligação/genética , Proteínas de Ciclo Celular/genética , Cristalografia por Raios X , DNA/genética , Proteínas de Ligação a DNA/genética , Fatores de Transcrição Forkhead , Regulação da Expressão Gênica/genética , Humanos , Complexos Multiproteicos/química , Complexos Multiproteicos/genética , Motivos de Nucleotídeos/genética , Sequências Reguladoras de Ácido Nucleico/genética , Proteínas Repressoras/genética
6.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38990514

RESUMO

Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.


Assuntos
Peptídeos , Sítios de Ligação , Peptídeos/química , Peptídeos/metabolismo , Ligação Proteica , Biologia Computacional/métodos , Algoritmos , Proteínas/química , Proteínas/metabolismo , Aprendizado de Máquina
7.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38701417

RESUMO

Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BERT)-based model, called BERT-TFBS, to predict TFBSs solely based on DNA sequences. The model consists of a pre-trained BERT module (DNABERT-2), a convolutional neural network (CNN) module, a convolutional block attention module (CBAM) and an output module. The BERT-TFBS model utilizes the pre-trained DNABERT-2 module to acquire the complex long-term dependencies in DNA sequences through a transfer learning approach, and applies the CNN module and the CBAM to extract high-order local features. The proposed model is trained and tested based on 165 ENCODE ChIP-seq datasets. We conducted experiments with model variants, cross-cell-line validations and comparisons with other models. The experimental results demonstrate the effectiveness and generalization capability of BERT-TFBS in predicting TFBSs, and they show that the proposed model outperforms other deep-learning models. The source code for BERT-TFBS is available at https://github.com/ZX1998-12/BERT-TFBS.


Assuntos
Redes Neurais de Computação , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Sítios de Ligação , Algoritmos , Biologia Computacional/métodos , Humanos , Aprendizado Profundo , Ligação Proteica
8.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39101501

RESUMO

Engineering enzyme-substrate binding pockets is the most efficient approach for modifying catalytic activity, but is limited if the substrate binding sites are indistinct. Here, we developed a 3D convolutional neural network for predicting protein-ligand binding sites. The network was integrated by DenseNet, UNet, and self-attention for extracting features and recovering sample size. We attempted to enlarge the dataset by data augmentation, and the model achieved success rates of 48.4%, 35.5%, and 43.6% at a precision of ≥50% and 52%, 47.6%, and 58.1%. The distance of predicted and real center is ≤4 Å, which is based on SC6K, COACH420, and BU48 validation datasets. The substrate binding sites of Klebsiella variicola acid phosphatase (KvAP) and Bacillus anthracis proline 4-hydroxylase (BaP4H) were predicted using DUnet, showing high competitive performance of 53.8% and 56% of the predicted binding sites that critically affected the catalysis of KvAP and BaP4H. Virtual saturation mutagenesis was applied based on the predicted binding sites of KvAP, and the top-ranked 10 single mutations contributed to stronger enzyme-substrate binding varied while the predicted sites were different. The advantage of DUnet for predicting key residues responsible for enzyme activity further promoted the success rate of virtual mutagenesis. This study highlighted the significance of correctly predicting key binding sites for enzyme engineering.


Assuntos
Aprendizado de Máquina , Sítios de Ligação , Engenharia de Proteínas/métodos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Fosfatase Ácida/química , Fosfatase Ácida/genética , Fosfatase Ácida/metabolismo , Especificidade por Substrato , Bacillus anthracis/genética , Bacillus anthracis/enzimologia , Klebsiella/genética , Klebsiella/enzimologia , Ligantes , Ligação Proteica , Modelos Moleculares , Redes Neurais de Computação
9.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38174583

RESUMO

Bioluminescence in beetles has long fascinated biologists, with diverse applications in biotechnology. To date, however, our understanding of its evolutionary origin and functional variation mechanisms remains poor. To address these questions, we obtained high-quality reference genomes of luminous and nonluminous beetles in 6 Elateroidea families. We then reconstructed a robust phylogenetic relationship for all luminous families and related nonluminous families. Comparative genomic analyses and biochemical functional experiments suggested that gene evolution within Elateroidea played a crucial role in the origin of bioluminescence, with multiple parallel origins observed in the luminous beetle families. While most luciferase-like proteins exhibited a conserved nonluminous amino acid pattern (TLA346 to 348) in the luciferin-binding sites, luciferases in the different luminous beetle families showed divergent luminous patterns at these sites (TSA/CCA/CSA/LVA). Comparisons of the structural and enzymatic properties of ancestral, extant, and site-directed mutant luciferases further reinforced the important role of these sites in the trade-off between acyl-CoA synthetase and luciferase activities. Furthermore, the evolution of bioluminescent color demonstrated a tendency toward hypsochromic shifts and variations among the luminous families. Taken together, our results revealed multiple parallel origins of bioluminescence and functional divergence within the beetle bioluminescent system.


Assuntos
Besouros , Animais , Humanos , Besouros/genética , Filogenia , Sequência de Aminoácidos , Luciferases/genética , Luciferases/química , Luciferases/metabolismo , Sítios de Ligação
10.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38149460

RESUMO

Evolution of gene expression mediated by cis-regulatory changes is thought to be an important contributor to organismal adaptation, but identifying adaptive cis-regulatory changes is challenging due to the difficulty in knowing the expectation under no positive selection. A new approach for detecting positive selection on transcription factor binding sites (TFBSs) was recently developed, thanks to the application of machine learning in predicting transcription factor (TF) binding affinities of DNA sequences. Given a TFBS sequence from a focal species and the corresponding inferred ancestral sequence that differs from the former at n sites, one can predict the TF-binding affinities of many n-step mutational neighbors of the ancestral sequence and obtain a null distribution of the derived binding affinity, which allows testing whether the binding affinity of the real derived sequence deviates significantly from the null distribution. Applying this test genomically to all experimentally identified binding sites of 3 TFs in humans, a recent study reported positive selection for elevated binding affinities of TFBSs. Here, we show that this genomic test suffers from an ascertainment bias because, even in the absence of positive selection for strengthened binding, the binding affinities of known human TFBSs are more likely to have increased than decreased in evolution. We demonstrate by computer simulation that this bias inflates the false positive rate of the selection test. We propose several methods to mitigate the ascertainment bias and show that almost all previously reported positive selection signals disappear when these methods are applied.


Assuntos
Genômica , Fatores de Transcrição , Humanos , Fatores de Transcrição/metabolismo , Simulação por Computador , Sítios de Ligação/genética , Ligação Proteica
11.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36748992

RESUMO

Interactions between DNA and transcription factors (TFs) play an essential role in understanding transcriptional regulation mechanisms and gene expression. Due to the large accumulation of training data and low expense, deep learning methods have shown huge potential in determining the specificity of TFs-DNA interactions. Convolutional network-based and self-attention network-based methods have been proposed for transcription factor binding sites (TFBSs) prediction. Convolutional operations are efficient to extract local features but easy to ignore global information, while self-attention mechanisms are expert in capturing long-distance dependencies but difficult to pay attention to local feature details. To discover comprehensive features for a given sequence as far as possible, we propose a Dual-branch model combining Self-Attention and Convolution, dubbed as DSAC, which fuses local features and global representations in an interactive way. In terms of features, convolution and self-attention contribute to feature extraction collaboratively, enhancing the representation learning. In terms of structure, a lightweight but efficient architecture of network is designed for the prediction, in particular, the dual-branch structure makes the convolution and the self-attention mechanism can be fully utilized to improve the predictive ability of our model. The experiment results on 165 ChIP-seq datasets show that DSAC obviously outperforms other five deep learning based methods and demonstrate that our model can effectively predict TFBSs based on sequence feature alone. The source code of DSAC is available at https://github.com/YuBinLab-QUST/DSAC/.


Assuntos
DNA , Redes Neurais de Computação , Ligação Proteica , Sítios de Ligação , Fatores de Transcrição/genética
12.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37824738

RESUMO

The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.


Assuntos
Redes Neurais de Computação , Proteínas , Sítios de Ligação , Proteínas/química , RNA/metabolismo , DNA , Idioma
13.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38113077

RESUMO

The coronavirus disease 2019 (COVID-19) pandemic has spurred a wide range of approaches to control and combat the disease. However, selecting an effective antiviral drug target remains a time-consuming challenge. Computational methods offer a promising solution by efficiently reducing the number of candidates. In this study, we propose a structure- and deep learning-based approach that identifies vulnerable regions in viral proteins corresponding to drug binding sites. Our approach takes into account the protein dynamics, accessibility and mutability of the binding site and the putative mechanism of action of the drug. We applied this technique to validate drug targeting toward severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein S. Our findings reveal a conformation- and oligomer-specific glycan-free binding site proximal to the receptor binding domain. This site comprises topologically important amino acid residues. Molecular dynamics simulations of Spike in complex with candidate drug molecules bound to the potential binding sites indicate an equilibrium shifted toward the inactive conformation compared with drug-free simulations. Small molecules targeting this binding site have the potential to prevent the closed-to-open conformational transition of Spike, thereby allosterically inhibiting its interaction with human angiotensin-converting enzyme 2 receptor. Using a pseudotyped virus-based assay with a SARS-CoV-2 neutralizing antibody, we identified a set of hit compounds that exhibited inhibition at micromolar concentrations.


Assuntos
COVID-19 , Aprendizado Profundo , Humanos , Ligação Proteica , Sítios de Ligação , SARS-CoV-2/metabolismo , Simulação de Dinâmica Molecular , Anticorpos Antivirais , Glicoproteína da Espícula de Coronavírus/metabolismo
14.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36398911

RESUMO

Identification of RNA-small molecule binding sites plays an essential role in RNA-targeted drug discovery and development. These small molecules are expected to be leading compounds to guide the development of new types of RNA-targeted therapeutics compared with regular therapeutics targeting proteins. RNAs can provide many potential drug targets with diverse structures and functions. However, up to now, only a few methods have been proposed. Predicting RNA-small molecule binding sites still remains a big challenge. New computational model is required to better extract the features and predict RNA-small molecule binding sites more accurately. In this paper, a deep learning model, RLBind, was proposed to predict RNA-small molecule binding sites from sequence-dependent and structure-dependent properties by combining global RNA sequence channel and local neighbor nucleotides channel. To our best knowledge, this research was the first to develop a convolutional neural network for RNA-small molecule binding sites prediction. Furthermore, RLBind also can be used as a potential tool when the RNA experimental tertiary structure is not available. The experimental results show that RLBind outperforms other state-of-the-art methods in predicting binding sites. Therefore, our study demonstrates that the combination of global information for full-length sequences and local information for limited local neighbor nucleotides in RNAs can improve the model's predictive performance for binding sites prediction. All datasets and resource codes are available at https://github.com/KailiWang1/RLBind.


Assuntos
Aprendizado Profundo , RNA , RNA/metabolismo , Algoritmos , Ligação Proteica , Ligantes , Sítios de Ligação
15.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37096593

RESUMO

While research into drug-target interaction (DTI) prediction is fairly mature, generalizability and interpretability are not always addressed in the existing works in this field. In this paper, we propose a deep learning (DL)-based framework, called BindingSite-AugmentedDTA, which improves drug-target affinity (DTA) predictions by reducing the search space of potential-binding sites of the protein, thus making the binding affinity prediction more efficient and accurate. Our BindingSite-AugmentedDTA is highly generalizable as it can be integrated with any DL-based regression model, while it significantly improves their prediction performance. Also, unlike many existing models, our model is highly interpretable due to its architecture and self-attention mechanism, which can provide a deeper understanding of its underlying prediction mechanism by mapping attention weights back to protein-binding sites. The computational results confirm that our framework can enhance the prediction performance of seven state-of-the-art DTA prediction algorithms in terms of four widely used evaluation metrics, including concordance index, mean squared error, modified squared correlation coefficient ($r^2_m$) and the area under the precision curve. We also contribute to three benchmark drug-traget interaction datasets by including additional information on 3D structure of all proteins contained in those datasets, which include the two most commonly used datasets, namely Kiba and Davis, as well as the data from IDG-DREAM drug-kinase binding prediction challenge. Furthermore, we experimentally validate the practical potential of our proposed framework through in-lab experiments. The relatively high agreement between computationally predicted and experimentally observed binding interactions supports the potential of our framework as the next-generation pipeline for prediction models in drug repurposing.


Assuntos
Algoritmos , Reposicionamento de Medicamentos , Desenvolvimento de Medicamentos , Proteínas/química , Sítios de Ligação
16.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37328639

RESUMO

Precise targeting of transcription factor binding sites (TFBSs) is essential to comprehending transcriptional regulatory processes and investigating cellular function. Although several deep learning algorithms have been created to predict TFBSs, the models' intrinsic mechanisms and prediction results are difficult to explain. There is still room for improvement in prediction performance. We present DeepSTF, a unique deep-learning architecture for predicting TFBSs by integrating DNA sequence and shape profiles. We use the improved transformer encoder structure for the first time in the TFBSs prediction approach. DeepSTF extracts DNA higher-order sequence features using stacked convolutional neural networks (CNNs), whereas rich DNA shape profiles are extracted by combining improved transformer encoder structure and bidirectional long short-term memory (Bi-LSTM), and, finally, the derived higher-order sequence features and representative shape profiles are integrated into the channel dimension to achieve accurate TFBSs prediction. Experiments on 165 ENCODE chromatin immunoprecipitation sequencing (ChIP-seq) datasets show that DeepSTF considerably outperforms several state-of-the-art algorithms in predicting TFBSs, and we explain the usefulness of the transformer encoder structure and the combined strategy using sequence features and shape profiles in capturing multiple dependencies and learning essential features. In addition, this paper examines the significance of DNA shape features predicting TFBSs. The source code of DeepSTF is available at https://github.com/YuBinLab-QUST/DeepSTF/.


Assuntos
DNA , Redes Neurais de Computação , Sítios de Ligação , Ligação Proteica , DNA/genética , DNA/química , Fatores de Transcrição/genética , Fatores de Transcrição/química
17.
BMC Bioinformatics ; 25(1): 156, 2024 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-38641811

RESUMO

BACKGROUND: Accurately identifying drug-target interaction (DTI), affinity (DTA), and binding sites (DTS) is crucial for drug screening, repositioning, and design, as well as for understanding the functions of target. Although there are a few online platforms based on deep learning for drug-target interaction, affinity, and binding sites identification, there is currently no integrated online platforms for all three aspects. RESULTS: Our solution, the novel integrated online platform Drug-Online, has been developed to facilitate drug screening, target identification, and understanding the functions of target in a progressive manner of "interaction-affinity-binding sites". Drug-Online platform consists of three parts: the first part uses the drug-target interaction identification method MGraphDTA, based on graph neural networks (GNN) and convolutional neural networks (CNN), to identify whether there is a drug-target interaction. If an interaction is identified, the second part employs the drug-target affinity identification method MMDTA, also based on GNN and CNN, to calculate the strength of drug-target interaction, i.e., affinity. Finally, the third part identifies drug-target binding sites, i.e., pockets. The method pt-lm-gnn used in this part is also based on GNN. CONCLUSIONS: Drug-Online is a reliable online platform that integrates drug-target interaction, affinity, and binding sites identification. It is freely available via the Internet at http://39.106.7.26:8000/Drug-Online/ .


Assuntos
Aprendizado Profundo , Interações Medicamentosas , Sítios de Ligação , Sistemas de Liberação de Medicamentos , Avaliação Pré-Clínica de Medicamentos
18.
BMC Bioinformatics ; 25(1): 122, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38515052

RESUMO

BACKGROUND: Nanobodies, also known as VHH or single-domain antibodies, are unique antibody fragments derived solely from heavy chains. They offer advantages of small molecules and conventional antibodies, making them promising therapeutics. The paratope is the specific region on an antibody that binds to an antigen. Paratope prediction involves the identification and characterization of the antigen-binding site on an antibody. This process is crucial for understanding the specificity and affinity of antibody-antigen interactions. Various computational methods and experimental approaches have been developed to predict and analyze paratopes, contributing to advancements in antibody engineering, drug development, and immunotherapy. However, existing predictive models trained on traditional antibodies may not be suitable for nanobodies. Additionally, the limited availability of nanobody datasets poses challenges in constructing accurate models. METHODS: To address these challenges, we have developed a novel nanobody prediction model, named NanoBERTa-ASP (Antibody Specificity Prediction), which is specifically designed for predicting nanobody-antigen binding sites. The model adopts a training strategy more suitable for nanobodies, based on an advanced natural language processing (NLP) model called BERT (Bidirectional Encoder Representations from Transformers). To be more specific, the model utilizes a masked language modeling approach named RoBERTa (Robustly Optimized BERT Pretraining Approach) to learn the contextual information of the nanobody sequence and predict its binding site. RESULTS: NanoBERTa-ASP achieved exceptional performance in predicting nanobody binding sites, outperforming existing methods, indicating its proficiency in capturing sequence information specific to nanobodies and accurately identifying their binding sites. Furthermore, NanoBERTa-ASP provides insights into the interaction mechanisms between nanobodies and antigens, contributing to a better understanding of nanobodies and facilitating the design and development of nanobodies with therapeutic potential. CONCLUSION: NanoBERTa-ASP represents a significant advancement in nanobody paratope prediction. Its superior performance highlights the potential of deep learning approaches in nanobody research. By leveraging the increasing volume of nanobody data, NanoBERTa-ASP can further refine its predictions, enhance its performance, and contribute to the development of novel nanobody-based therapeutics. Github repository: https://github.com/WangLabforComputationalBiology/NanoBERTa-ASP.


Assuntos
Anticorpos de Domínio Único , Sítios de Ligação de Anticorpos , Anticorpos de Domínio Único/química , Anticorpos , Sítios de Ligação , Especificidade de Anticorpos
19.
Plant J ; 116(1): 234-250, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37387536

RESUMO

Enhancers are critical cis-regulatory elements controlling gene expression during cell development and differentiation. However, genome-wide enhancer characterization has been challenging due to the lack of a well-defined relationship between enhancers and genes. Function-based methods are the gold standard for determining the biological function of cis-regulatory elements; however, these methods have not been widely applied to plants. Here, we applied a massively parallel reporter assay on Arabidopsis to measure enhancer activities across the genome. We identified 4327 enhancers with various combinations of epigenetic modifications distinctively different from animal enhancers. Furthermore, we showed that enhancers differ from promoters in their preference for transcription factors. Although some enhancers are not conserved and overlap with transposable elements forming clusters, enhancers are generally conserved across thousand Arabidopsis accessions, suggesting they are selected under evolution pressure and could play critical roles in the regulation of important genes. Moreover, comparison analysis reveals that enhancers identified by different strategies do not overlap, suggesting these methods are complementary in nature. In sum, we systematically investigated the features of enhancers identified by functional assay in A. thaliana, which lays the foundation for further investigation into enhancers' functional mechanisms in plants.


Assuntos
Arabidopsis , Animais , Arabidopsis/genética , Elementos Facilitadores Genéticos/genética , Regiões Promotoras Genéticas/genética , Fatores de Transcrição/genética , Epigênese Genética
20.
Hum Mol Genet ; 31(R1): R114-R122, 2022 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-36083269

RESUMO

Every cell in the human body inherits a copy of the same genetic information. The three billion base pairs of DNA in the human genome, and the roughly 50 000 coding and non-coding genes they contain, must thus encode all the complexity of human development and cell and tissue type diversity. Differences in gene regulation, or the modulation of gene expression, enable individual cells to interpret the genome differently to carry out their specific functions. Here we discuss recent and ongoing efforts to build gene regulatory maps, which aim to characterize the regulatory roles of all sequences in a genome. Many researchers and consortia have identified such regulatory elements using functional assays and evolutionary analyses; we discuss the results, strengths and shortcomings of their approaches. We also discuss new techniques the field can leverage and emerging challenges it will face while striving to build gene regulatory maps of ever-increasing resolution and comprehensiveness.


Assuntos
Regulação da Expressão Gênica , Sequências Reguladoras de Ácido Nucleico , Humanos , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Mapeamento Cromossômico , DNA/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA