Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38739505

RESUMO

This study aims to tackle the intricate challenge of predicting RNA-small molecule binding sites to explore the potential value in the field of RNA drug targets. To address this challenge, we propose the MultiModRLBP method, which integrates multi-modal features using deep learning algorithms. These features include 3D structural properties at the nucleotide base level of the RNA molecule, relational graphs based on overall RNA structure, and rich RNA semantic information. In our investigation, we gathered 851 interactions between RNA and small molecule ligand from the RNAglib dataset and RLBind training set. Unlike conventional training sets, this collection broadened its scope by including RNA complexes that have the same RNA sequence but change their respective binding sites due to structural differences or the presence of different ligands. This enhancement enables the MultiModRLBP model to more accurately capture subtle changes at the structural level, ultimately improving its ability to discern nuances among similar RNA conformations. Furthermore, we evaluated MultiModRLBP on two classic test sets, Test18 and Test3, highlighting its performance disparities on small molecules based on metal and non-metal ions. Additionally, we conducted a structural sensitivity analysis on specific complex categories, considering RNA instances with varying degrees of structural changes and whether they share the same ligands. The research results indicate that MultiModRLBP outperforms the current state-of-the-art methods on multiple classic test sets, particularly excelling in predicting binding sites for non-metal ions and instances where the binding sites are widely distributed along the sequence. MultiModRLBP also can be used as a potential tool when the RNA structure is perturbed or the RNA experimental tertiary structure is not available. Most importantly, MultiModRLBP exhibits the capability to distinguish binding characteristics of RNA that are structurally diverse yet exhibit sequence similarity. These advancements hold promise in reducing the costs associated with the development of RNA-targeted drugs.

2.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38175759

RESUMO

MOTIVATION: Binding of peptides to major histocompatibility complex (MHC) molecules plays a crucial role in triggering T cell recognition mechanisms essential for immune response. Accurate prediction of MHC-peptide binding is vital for the development of cancer therapeutic vaccines. While recent deep learning-based methods have achieved significant performance in predicting MHC-peptide binding affinity, most of them separately encode MHC molecules and peptides as inputs, potentially overlooking critical interaction information between the two. RESULTS: In this work, we propose RPEMHC, a new deep learning approach based on residue-residue pair encoding to predict the binding affinity between peptides and MHC, which encode an MHC molecule and a peptide as a residue-residue pair map. We evaluate the performance of RPEMHC on various MHC-II-related datasets for MHC-peptide binding prediction, demonstrating that RPEMHC achieves better or comparable performance against other state-of-the-art baselines. Moreover, we further construct experiments on MHC-I-related datasets, and experimental results demonstrate that our method can work on both two MHC classes. These extensive validations have manifested that RPEMHC is an effective tool for studying MHC-peptide interactions and can potentially facilitate the vaccine development. AVAILABILITY: The source code of the method along with trained models is freely available at https://github.com/lennylv/RPEMHC.


Assuntos
Aprendizado Profundo , Ligação Proteica , Peptídeos/química , Complexo Principal de Histocompatibilidade , Antígenos de Histocompatibilidade Classe I/metabolismo
3.
Environ Res ; 244: 117969, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38109956

RESUMO

Alkaline pre-treatment is known to enhance the acid production efficiency of sludge but adversely affects its dewatering performance. In this study, the improvement of sludge dewaterability by a novel bioleaching system with inoculating domesticated acidified sludge (AS) and its underlying mechanism were investigated. The results showed that although the addition of Fe2+ and the reduction of pH improved the dewatering performance of sludge, their effects were inferior to that of AS + Fe. The addition of AS and Fe2+ significantly reduced the specific resistance to filtration and capillary suction time of the sludge by 98.6 % and 95.5 %, respectively. This improvement in dewatering performance was achieved through the combined actions of bio-acidification, bio-oxidation, and bio-flocculation. Remarkably, under alkaline pH, microorganisms in AS remained active, leading to the formation of iron-based bioflocculants, along with a rapid pH decrease. These bioflocculants, in combination with protein (PN) in tightly bound extracellular polymeric substances (TB-EPS) through amide bonding, transformed TB-EPS from extractable to non-extractable form, reducing PN content from 12.1 mg g-1DS to 5.09 mg g-1DS and altering the protein's secondary structure. Consequently, the gel-like TB-EPS matrix effectively broke down, releasing cellular water and significantly enhancing sludge dewaterability.


Assuntos
Esgotos , Água , Água/química , Ferro/química , Filtração , Oxirredução , Eliminação de Resíduos Líquidos/métodos
4.
J Chem Inf Model ; 63(22): 7258-7271, 2023 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-37931253

RESUMO

Phosphorylation, as one of the most important post-translational modifications, plays a key role in various cellular physiological processes and disease occurrences. In recent years, computer technology has been gradually applied to the prediction of protein phosphorylation sites. However, most existing methods rely on simple protein sequence features that provide limited contextual information. To overcome this limitation, we propose DeepMPSF, a phosphorylation site prediction model based on multiple protein sequence features. There are two types of features: sequence semantic features, which comprise protein residue type information and relative position information within protein sequence, and protein background biophysical features, which include global semantic information containing more comprehensive protein background information obtained from pretrained models. To extract these features, DeepMPSF employs two separate subnetworks: the S71SFE module and the BBFE module, which automatically extract high-level semantic features. Our model incorporates a learning strategy for handling imbalanced datasets through ensemble learning during training and prediction. DeepMPSF is trained and evaluated on a well-established dataset of human proteins. Comparing the analysis with other benchmark methods reveals that DeepMPSF outperforms in predicting both S/T residues and Y residues. In particular, DeepMPSF showed excellent generalization performance in cross-species blind test performance, with an average improvement of 5.63%/5.72%, 22.28%/25.94%, 20.11%/17.49%, and 26.40%/28.33% for Mus musculus/Rattus norvegicus test sets in area under curves (AUCs) of ROC curve, AUC of the PR curve, F1-score, and MCC metrics, respectively. Furthermore, it also shows excellent performance in the latest updated case of natural proteins with functional phosphorylation sites. Through an ablation study and visual analysis, we uncover that the design of different feature modules significantly contributes to the accurate classification of DeepMPSF, which provides valuable insights for predicting phosphorylation sites and offers effective support for future downstream research.


Assuntos
Aprendizado Profundo , Camundongos , Animais , Humanos , Ratos , Fosforilação , Proteínas/química , Sequência de Aminoácidos , Processamento de Proteína Pós-Traducional
5.
Chemosphere ; 339: 139714, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37543234

RESUMO

Improving the dewatering performance of sewage sludge is of great scientific and engineering significance in the context of accelerated urbanization and increasingly strict environmental regulations. Acidified sludge (AS) can improve sludge dewatering performance, but the dewatering effect of repeated inoculation is unclear. The effects of long-term repeated inoculation of AS on the sludge dewaterability were investigated. The molecular structure and microbial community succession of extracellular polymeric substances (EPS) are emphasized. The results revealed that increasing the inoculation ratio of AS reduced the pH, absolute value of sludge zeta potential, and sludge particle size, and the decreasing trend was more evident with prolonging treatment time. Under the conditions of 30% and 50% AS inoculation, the dewatering performance of the sludge was significantly improved (p < 0.05). Compared with the raw sludge, the specific resistance of filtration (SRF) and capillary suction time of 30% inoculation were reduced by 64.3% and 50.1% after 30 cycles, respectively. Excluding loosely bound (LB)-EPS, soluble (S)-EPS and tightly bound (TB)-EPS exhibited a visible decrease, the protein in TB-EPS was significantly related to sludge dewaterability (p < 0.05). The fluorescent components of aromatic protein and fulvic acid-like substances in TB-EPS were significantly associated with SRF, with a correlation coefficient 0.99 (p < 0.05). Both the increase in the percentages of random coil and decrease in α-helix in TB-EPS contributed to improving dewaterability. Increasing Firmicutes and decreasing Chloroflexi levels improved the sludge dewatering capacity. Repeated inoculation did not disrupt the dewatering effect of AS rather increased the feasibility of the engineering application of AS. Considering the dewatering performance and cost synthetically, 30% AS inoculated ratio is feasible for practical applications.


Assuntos
Matriz Extracelular de Substâncias Poliméricas , Esgotos , Esgotos/química , Estrutura Molecular , Água/química , Proteínas/química , Eliminação de Resíduos Líquidos/métodos
6.
IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3623-3634, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37607147

RESUMO

Accurate identification of RNA modification sites is of great significance in understanding the functions and regulatory mechanisms of RNAs. Recent advances have shown great promise in applying computational methods based on deep learning for accurate prediction of RNA modifications. However, those methods generally predicted only a single type of RNA modification. In addition, such methods suffered from the scarcity of the interpretability for their predicted results. In this work, a new Transformer-based deep learning method was proposed to predict multiple RNA modifications simultaneously, referred to as TransRNAm. More specifically, TransRNAm employs Transformer to extract contextual feature and convolutional neural networks to further learn high-latent feature representations of RNA sequences relevant for RNA modifications. Importantly, by integrating the self-attention mechanism in Transformer with convolutional neural network, TransRNAm is capable of not only capturing the critical nucleotide sites that contribute significantly to RNA modification prediction, but also revealing the underlying association among different types of RNA modifications. Consequently, this work provided an accurate and interpretable predictor for multiple RNA modification prediction, which may contribute to uncovering the sequence-based forming mechanism of RNA modification sites.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Nucleotídeos , RNA/genética
7.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2089-2100, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37018301

RESUMO

Effectively and accurately predicting the effects of interactions between proteins after amino acid mutations is a key issue for understanding the mechanism of protein function and drug design. In this study, we present a deep graph convolution (DGC) network-based framework, DGCddG, to predict the changes of protein-protein binding affinity after mutation. DGCddG incorporates multi-layer graph convolution to extract a deep, contextualized representation for each residue of the protein complex structure. The mined channels of the mutation sites by DGC is then fitted to the binding affinity with a multi-layer perceptron. Experiments with results on multiple datasets show that our model can achieve relatively good performance for both single and multi-point mutations. For blind tests on datasets related to angiotensin-converting enzyme 2 binding with the SARS-CoV-2 virus, our method shows better results in predicting ACE2 changes, may help in finding favorable antibodies. Code and data availability: https://github.com/lennylv/DGCddG.


Assuntos
COVID-19 , Humanos , Ligação Proteica/genética , COVID-19/genética , SARS-CoV-2/genética , Mutação/genética , Mutação Puntual
8.
J Chem Inf Model ; 63(7): 2251-2262, 2023 04 10.
Artigo em Inglês | MEDLINE | ID: mdl-36989086

RESUMO

Identifying the binding residues of protein-peptide complexes is essential for understanding protein function mechanisms and exploring drug discovery. Recently, many computational methods have been developed to predict the interaction sites of either protein or peptide. However, to our knowledge, no prediction method can simultaneously identify the interaction sites on both the protein and peptide sides. Here, we propose a deep graph convolutional network (GCN)-based method called GraphPPepIS to predict the interaction sites of protein-peptide complexes using protein and peptide structural information. We also propose a companion method, SeqPPepIS, for assisting with the lack of structural information and the flexibility of peptides. SepPPepIS replaces the peptide structural features in GraphPPepIS by learning features from peptide sequences. We performed a comprehensive evaluation of the benchmark data sets, and the results show that our two methods outperform state-of-the-art methods on the accurate interaction sites of both protein and peptide sides. We show that our methods can help improve protein-peptide docking. For docking data sets, our methods maintain robust performance in identifying binding sites, thereby enhancing the prediction of peptide binding poses. Finally, we visualized the analysis of protein and peptide graph embedding to demonstrate the learning ability of graph convolution in predicting interaction sites, which was mainly obtained through the shared parameters of a protein graph and peptide graph.


Assuntos
Benchmarking , Peptídeos , Sequência de Aminoácidos , Sítios de Ligação , Descoberta de Drogas
9.
Bioinformatics ; 39(2)2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36688724

RESUMO

MOTIVATION: Accurate and rapid prediction of protein-ligand binding affinity is a great challenge currently encountered in drug discovery. Recent advances have manifested a promising alternative in applying deep learning-based computational approaches for accurately quantifying binding affinity. The structure complementarity between protein-binding pocket and ligand has a great effect on the binding strength between a protein and a ligand, but most of existing deep learning approaches usually extracted the features of pocket and ligand by these two detached modules. RESULTS: In this work, a new deep learning approach based on the cross-attention mechanism named CAPLA was developed for improved prediction of protein-ligand binding affinity by learning features from sequence-level information of both protein and ligand. Specifically, CAPLA employs the cross-attention mechanism to capture the mutual effect of protein-binding pocket and ligand. We evaluated the performance of our proposed CAPLA on comprehensive benchmarking experiments on binding affinity prediction, demonstrating the superior performance of CAPLA over state-of-the-art baseline approaches. Moreover, we provided the interpretability for CAPLA to uncover critical functional residues that contribute most to the binding affinity through the analysis of the attention scores generated by the cross-attention mechanism. Consequently, these results indicate that CAPLA is an effective approach for binding affinity prediction and may contribute to useful help for further consequent applications. AVAILABILITY AND IMPLEMENTATION: The source code of the method along with trained models is freely available at https://github.com/lennylv/CAPLA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Ligantes , Proteínas/química , Ligação Proteica , Software
10.
Artigo em Inglês | MEDLINE | ID: mdl-35213314

RESUMO

Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctP2ISP to improve the prediction of protein-protein interaction sites. ctP2ISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctP2ISP was evaluated against current state-of-the-art methods on six public datasets. The results show that ctP2ISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.


Assuntos
Redes Neurais de Computação , Software , Benchmarking
11.
IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1594-1599, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35471887

RESUMO

The binding of DNA sequences to cell type-specific transcription factors is essential for regulating gene expression in all organisms. Many variants occurring in these binding regions play crucial roles in human disease by disrupting the cis-regulation of gene expression. We first implemented a sequence-based deep learning model called deepBICS to quantify the intensity of transcription factors-DNA binding. The experimental results not only showed the superiority of deepBICS on ChIP-seq data sets but also suggested deepBICS as a language model could help the classification of disease-related and neutral variants. We then built a language model-based method called deepBICS4SNV to predict the pathogenicity of single nucleotide variants. The good performance of deepBICS4SNV on 2 tests related to Mendelian disorders and viral diseases shows the sequence contextual information derived from language models can improve prediction accuracy and generalization capability.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Aprendizado Profundo , Humanos , Virulência , Sítios de Ligação/genética , DNA/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Nucleotídeos
12.
J Chem Inf Model ; 62(23): 6258-6270, 2022 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-36449561

RESUMO

Many computational methods have been proposed to predict drug-drug interactions (DDIs), which can occur when combining drugs to treat various diseases, but most mainly utilize single-source features of drugs, which is inadequate for drug representation. To fill this gap, we propose two attention-mechanism-based encoder-decoder models that incorporate multisource information: one is MAEDDI, which can predict DDIs, and the other is MAEDDIE, which can make further DDI-associated event predictions for drug pairs with DDIs. To better express the drug feature, we used three encoding methods to encode the drugs, integrating the self-attention mechanism, cross-attention mechanism, and graph attention network to construct a multisource feature fusion network. Experiments showed that both MAEDDI and MAEDDIE performed better than some state-of-the-art methods in various validation attempts at different experimental tasks. The visualization analysis showed that the semantic features of drug pairs learned from our models had a good drug representation. In practice, MAEDDIE successfully screened 43 DDI events on favipiravir, an influenza antiviral drug, with a success rate of nearly 50%. Our model achieved competitive results, mainly owing to the design of sequence-based, structural, biochemical, and statistical multisource features. Moreover, different encoders constructed based on different features learn the interrelationship information between drug pairs, and the different representations of these drug pairs are incorporated to predict the target problem. All of these encoders were designed to better characterize the complex DDI relationships, allowing us to achieve high generalization in DDI and DDI-associated event predations.


Assuntos
Semântica , Interações Medicamentosas
13.
Genes (Basel) ; 13(11)2022 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-36360220

RESUMO

Nucleosome positioning is involved in diverse cellular biological processes by regulating the accessibility of DNA sequences to DNA-binding proteins and plays a vital role. Previous studies have manifested that the intrinsic preference of nucleosomes for DNA sequences may play a dominant role in nucleosome positioning. As a consequence, it is nontrivial to develop computational methods only based on DNA sequence information to accurately identify nucleosome positioning, and thus intend to verify the contribution of DNA sequences responsible for nucleosome positioning. In this work, we propose a new deep learning-based method, named DeepNup, which enables us to improve the prediction of nucleosome positioning only from DNA sequences. Specifically, we first use a hybrid feature encoding scheme that combines One-hot encoding and Trinucleotide composition encoding to encode raw DNA sequences; afterwards, we employ multiscale convolutional neural network modules that consist of two parallel convolution kernels with different sizes and gated recurrent units to effectively learn the local and global correlation feature representations; lastly, we use a fully connected layer and a sigmoid unit serving as a classifier to integrate these learned high-order feature representations and generate the final prediction outcomes. By comparing the experimental evaluation metrics on two benchmark nucleosome positioning datasets, DeepNup achieves a better performance for nucleosome positioning prediction than that of several state-of-the-art methods. These results demonstrate that DeepNup is a powerful deep learning-based tool that enables one to accurately identify potential nucleosome sequences.


Assuntos
Nucleossomos , Saccharomyces cerevisiae , Nucleossomos/genética , Nucleossomos/metabolismo , Sequência de Bases , Saccharomyces cerevisiae/genética , Montagem e Desmontagem da Cromatina , Redes Neurais de Computação
14.
Bioinformatics ; 38(17): 4070-4077, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35809058

RESUMO

MOTIVATION: Histone modifications are epigenetic markers that impact gene expression by altering the chromatin structure or recruiting histone modifiers. Their accurate identification is key to unraveling the mechanisms by which they regulate gene expression. However, the solutions for this task can be improved by exploiting multiple relationships from dataset and exploring designs of learning models, for example jointly learning technology. RESULTS: This article proposes a deep learning-based multi-objective computational approach, iHMnBS, to identify which of the seven typical histone modifications a DNA sequence may choose to bind, and which parts of the DNA sequence bind to them. iHMnBS employs a customized dataset that allows the marking of modifications contained in histones that may bind to any position in the DNA sequence. iHMnBS tries to mine the information implicit in this richer data by means of deep neural networks. In comprehensive comparisons, iHMnBS outperforms a baseline method, and the probability of binding to modified histones assigned to a representative nucleotide of a DNA sequence can serve as a reference for biological experiments. Since the interaction between transcription factors and histone modifications has an important role in gene expression, we extracted a number of sequence patterns that may bind to transcription factors, and explored their possible impact on disease. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/lennylv/iHMnBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Histonas , Histonas/metabolismo , Sequência de Bases , Sítios de Ligação , DNA/química , Fatores de Transcrição/metabolismo
15.
Bioinformatics ; 38(10): 2705-2711, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561183

RESUMO

MOTIVATION: Protein structure can be severely disrupted by frameshift and non-sense mutations at specific positions in the protein sequence. Frameshift and non-sense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and non-sense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and non-sense mutations remains under-researched. RESULTS: We built a Transformer-based neural network model to predict the pathogenicity of frameshift and non-sense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations-not only can multiple self-attention heads learn local and global interactions but also functional sites with a large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short. AVAILABILITY AND IMPLEMENTATION: TransPPMP is available at https://github.com/lennylv/TransPPMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mutação da Fase de Leitura , Software , Humanos , Mutação , Redes Neurais de Computação
16.
Genes (Basel) ; 13(4)2022 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-35456374

RESUMO

A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug-DNA interactions, but they can promote or inhibit the expression of the critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, the biological experimental techniques for measuring it are expensive and time-consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information provided by the bases in gene sequences. To address these issues, we proposed a new solution called SemanticCAP. It introduces a gene language model that models the context of gene sequences and is thus able to provide an effective representation of a certain site in a gene sequence. Basically, we merged the features provided by the gene language model into our chromatin accessibility model. During the process, we designed methods called SFA and SFC to make feature fusion smoother. Compared to DeepSEA, gkm-SVM, and k-mer using public benchmarks, our model proved to have better performance, showing a 1.25% maximum improvement in auROC and a 2.41% maximum improvement in auPRC.


Assuntos
Cromatina , Idioma , Cromatina/genética , Imunoprecipitação da Cromatina , DNA/genética , Fatores de Transcrição/genética
17.
Artigo em Inglês | MEDLINE | ID: mdl-32976105

RESUMO

Deep learning has been successfully applied to surprisingly different domains. Researchers and practitioners are employing trained deep learning models to enrich our knowledge. Transcription factors (TFs)are essential for regulating gene expression in all organisms by binding to specific DNA sequences. Here, we designed a deep learning model named SemanticCS (Semantic ChIP-seq)to predict TF binding specificities. We trained our learning model on an ensemble of ChIP-seq datasets (Multi-TF-cell)to learn useful intermediate features across multiple TFs and cells. To interpret these feature vectors, visualization analysis was used. Our results indicate that these learned representations can be used to train shallow machines for other tasks. Using diverse experimental data and evaluation metrics, we show that SemanticCS outperforms other popular methods. In addition, from experimental data, SemanticCS can help to identify the substitutions that cause regulatory abnormalities and to evaluate the effect of substitutions on the binding affinity for the RXR transcription factor. The online server for SemanticCS is freely available at http://qianglab.scst.suda.edu.cn/semanticCS/.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Fatores de Transcrição , Sequência de Bases , Sítios de Ligação/genética , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
18.
IEEE J Biomed Health Inform ; 25(7): 2811-2819, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33571101

RESUMO

The control of the coordinated expression of genes is primarily regulated by the interactions between transcription factors (TFs) and their DNA binding sites, which are an integral part of transcriptional regulatory networks. There are many computational tools focused on determining TF binding or unbinding to a DNA sequence. However, other tools focused on further determining the relative preference of such binding are needed. Here, we propose a regression model with deep learning, called SemanticBI, to predict intensities of TF-DNA binding. SemanticBI is a convolutional neural network (CNN)-recurrent neural network (RNN) architecture model that was trained on an ensemble of protein binding microarray data sets that covered multiple TFs. Using this approach, SemanticBI exhibited superior accuracy in predicting binding intensities compared to other popular methods. Moreover, SemanticBI uncovered vectorized sequence-oriented features using its CNN-RNN architecture, which is an abstract representation of the original DNA sequences. Additionally, the use of SemanticBI raises the question of whether motifs are necessary for computational models of TF binding. The online SemanticBI service can be accessed at http://qianglab.scst.suda.edu.cn/semantic/.


Assuntos
Algoritmos , Biologia Computacional , Sítios de Ligação , DNA/genética , Humanos , Ligação Proteica , Fatores de Transcrição/genética
19.
Sci Total Environ ; 765: 144375, 2021 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-33385815

RESUMO

Heavy metals (HMs) are constantly released into the environment during the production and use of batteries. Battery manufacturing has been ongoing for over six decades in the "Battery Industrial Capital" (located in Xinxiang City) of China, but the potential exposure pathways of residents in this region to HMs remain unclear. To clarify the exposure pathways and health risk of human exposure to HMs, hand wipe samples (n=82) and fingernail samples (n=36) were collected from residents (including young children (0-6 years old), children (7-12 years old) and adults (30-60 years old)) living around battery factories. The total concentrations of the target HMs (Zn, Mn, Cu, Pb, Ni, Cr, Cd, Co) in hand wipes ranged from 133 to 8040 µg/m2, and those in fingernails ranged from 9.7-566 µg/g. HM levels in the hand wipe and fingernail samples both decreased with age, and higher HM levels were observed for males than females. The HM composition profiles in these two matrices represented a high degree of similarity, with Zn as the predominant element, and thus, oral ingestion and dermal exposure via dust were expected to be the most important HM exposure pathways for residents in this region. The non-carcinogenic risks (HQs) from dermal and oral ingestion exposure to Cd, Cr, and Pb were higher than those of the other five elements for all three populations, and the HQderm of Cd for young children was 2.1 (HQoral=0.6). Moreover, the hazard index (HI) values of ∑8HMs for young children (HItotal=5.2, HIoral=2.0, HIdermal=3.2) and children (HItotal=1.6, HIoral=1.3, HIdermal=0.3) exceeded the safe threshold (1.0). Therefore, young children and children should be prioritized for protection from HM pollution, and more attention should be paid to young children's dermal exposure to Cd in this region.


Assuntos
Poeira , Metais Pesados , Adulto , Criança , Pré-Escolar , China , Cidades , Poeira/análise , Monitoramento Ambiental , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Metais Pesados/análise , Pessoa de Meia-Idade , Medição de Risco
20.
Cell Host Microbe ; 27(3): 325-328, 2020 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-32035028

RESUMO

An in-depth annotation of the newly discovered coronavirus (2019-nCoV) genome has revealed differences between 2019-nCoV and severe acute respiratory syndrome (SARS) or SARS-like coronaviruses. A systematic comparison identified 380 amino acid substitutions between these coronaviruses, which may have caused functional and pathogenic divergence of 2019-nCoV.


Assuntos
Betacoronavirus/classificação , Infecções por Coronavirus/virologia , Genoma Viral , Filogenia , Pneumonia Viral/virologia , Substituição de Aminoácidos , COVID-19 , China , Coronavírus da Síndrome Respiratória do Oriente Médio , Pandemias , Coronavírus Relacionado à Síndrome Respiratória Aguda Grave , SARS-CoV-2
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA