Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
Add more filters










Publication year range
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38271483

ABSTRACT

The advent of single-cell sequencing technologies has revolutionized cell biology studies. However, integrative analyses of diverse single-cell data face serious challenges, including technological noise, sample heterogeneity, and different modalities and species. To address these problems, we propose scCorrector, a variational autoencoder-based model that can integrate single-cell data from different studies and map them into a common space. Specifically, we designed a Study Specific Adaptive Normalization for each study in decoder to implement these features. scCorrector substantially achieves competitive and robust performance compared with state-of-the-art methods and brings novel insights under various circumstances (e.g. various batches, multi-omics, cross-species, and development stages). In addition, the integration of single-cell data and spatial data makes it possible to transfer information between different studies, which greatly expand the narrow range of genes covered by MERFISH technology. In summary, scCorrector can efficiently integrate multi-study single-cell datasets, thereby providing broad opportunities to tackle challenges emerging from noisy resources.

2.
BMC Bioinformatics ; 24(1): 481, 2023 Dec 16.
Article in English | MEDLINE | ID: mdl-38104057

ABSTRACT

BACKGROUND: The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train. RESULTS: To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space. CONCLUSIONS: The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.


Subject(s)
Leukocytes, Mononuclear , Single-Cell Gene Expression Analysis , Sequence Analysis, RNA/methods , Leukocytes, Mononuclear/metabolism , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Cluster Analysis
3.
Front Genet ; 14: 1283404, 2023.
Article in English | MEDLINE | ID: mdl-37867600

ABSTRACT

Introduction: CircRNA-protein binding plays a critical role in complex biological activity and disease. Various deep learning-based algorithms have been proposed to identify CircRNA-protein binding sites. These methods predict whether the CircRNA sequence includes protein binding sites from the sequence level, and primarily concentrate on analysing the sequence specificity of CircRNA-protein binding. For model performance, these methods are unsatisfactory in accurately predicting motif sites that have special functions in gene expression. Methods: In this study, based on the deep learning models that implement pixel-level binary classification prediction in computer vision, we viewed the CircRNA-protein binding sites prediction as a nucleotide-level binary classification task, and use a fully convolutional neural networks to identify CircRNA-protein binding motif sites (CPBFCN). Results: CPBFCN provides a new path to predict CircRNA motifs. Based on the MEME tool, the existing CircRNA-related and protein-related database, we analysed the motif functions discovered by CPBFCN. We also investigated the correlation between CircRNA sponge and motif distribution. Furthermore, by comparing the motif distribution with different input sequence lengths, we found that some motifs in the flanking sequences of CircRNA-protein binding region may contribute to CircRNA-protein binding. Conclusion: This study contributes to identify circRNA-protein binding and provides help in understanding the role of circRNA-protein binding in gene expression regulation.

4.
Front Microbiol ; 14: 1238199, 2023.
Article in English | MEDLINE | ID: mdl-37675425

ABSTRACT

Introduction: Imbalances in gut microbes have been implied in many human diseases, including colorectal cancer (CRC), inflammatory bowel disease, type 2 diabetes, obesity, autism, and Alzheimer's disease. Compared with other human diseases, CRC is a gastrointestinal malignancy with high mortality and a high probability of metastasis. However, current studies mainly focus on the prediction of colorectal cancer while neglecting the more serious malignancy of metastatic colorectal cancer (mCRC). In addition, high dimensionality and small samples lead to the complexity of gut microbial data, which increases the difficulty of traditional machine learning models. Methods: To address these challenges, we collected and processed 16S rRNA data and calculated abundance data from patients with non-metastatic colorectal cancer (non-mCRC) and mCRC. Different from the traditional health-disease classification strategy, we adopted a novel disease-disease classification strategy and proposed a microbiome-based multi-view convolutional variational information bottleneck (MV-CVIB). Results: The experimental results show that MV-CVIB can effectively predict mCRC. This model can achieve AUC values above 0.9 compared to other state-of-the-art models. Not only that, MV-CVIB also achieved satisfactory predictive performance on multiple published CRC gut microbiome datasets. Discussion: Finally, multiple gut microbiota analyses were used to elucidate communities and differences between mCRC and non-mCRC, and the metastatic properties of CRC were assessed by patient age and microbiota expression.

5.
PLoS Comput Biol ; 19(8): e1011344, 2023 08.
Article in English | MEDLINE | ID: mdl-37651321

ABSTRACT

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.


Subject(s)
Algorithms , RNA, Circular , Humans , RNA, Circular/genetics , Semantics
6.
IEEE J Biomed Health Inform ; 27(9): 4611-4622, 2023 09.
Article in English | MEDLINE | ID: mdl-37368803

ABSTRACT

The abuse of traditional antibiotics has led to increased resistance of bacteria and viruses. Efficient therapeutic peptide prediction is critical for peptide drug discovery. However, most of the existing methods only make effective predictions for one class of therapeutic peptides. It is worth noting that currently no predictive method considers sequence length information as a distinct feature of therapeutic peptides. In this article, a novel deep learning approach with matrix factorization for predicting therapeutic peptides (DeepTPpred) by integrating length information are proposed. The matrix factorization layer can learn the potential features of the encoded sequence through the mechanism of first compression and then restoration. And the length features of the sequence of therapeutic peptides are embedded with encoded amino acid sequences. To automatically learn therapeutic peptide predictions, these latent features are input into the neural networks with self-attention mechanism. On eight therapeutic peptide datasets, DeepTPpred achieved excellent prediction results. Based on these datasets, we first integrated eight datasets to obtain a full therapeutic peptide integration dataset. Then, we obtained two functional integration datasets based on the functional similarity of the peptides. Finally, we also conduct experiments on the latest versions of the ACP and CPP datasets. Overall, the experimental results show that our work is effective for the identification of therapeutic peptides.


Subject(s)
Deep Learning , Humans , Peptides/chemistry , Neural Networks, Computer , Drug Discovery
7.
Article in English | MEDLINE | ID: mdl-35389869

ABSTRACT

DNA-binding proteins (DBPs) play vital roles in the regulation of biological systems. Although there are already many deep learning methods for predicting the sequence specificities of DBPs, they face two challenges as follows. Classic deep learning methods for DBPs prediction usually fail to capture the dependencies between genomic sequences since their commonly used one-hot codes are mutually orthogonal. Besides, these methods usually perform poorly when samples are inadequate. To address these two challenges, we developed a novel language model for mining DBPs using human genomic data and ChIP-seq datasets with decaying learning rates, named DNA Fine-tuned Language Model (DFLM). It can capture the dependencies between genome sequences based on the context of human genomic data and then fine-tune the features of DBPs tasks using different ChIP-seq datasets. First, we compared DFLM with the existing widely used methods on 69 datasets and we achieved excellent performance. Moreover, we conducted comparative experiments on complex DBPs and small datasets. The results show that DFLM still achieved a significant improvement. Finally, through visualization analysis of one-hot encoding and DFLM, we found that one-hot encoding completely cut off the dependencies of DNA sequences themselves, while DFLM using language models can well represent the dependency of DNA sequences. Source code are available at: https://github.com/Deep-Bioinfo/DFLM.


Subject(s)
Algorithms , DNA-Binding Proteins , Humans , Genomics , DNA/genetics , Genome
8.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36484687

ABSTRACT

MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF's intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. RESULTS: In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. AVAILABILITY AND IMPLEMENTATION: The source code is available at: https://github.com/turningpoint1988/CSSBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Chromatin , Transcription Factors , Humans , Protein Binding/genetics , Binding Sites/genetics , Transcription Factors/metabolism , Chromatin Immunoprecipitation Sequencing , Computational Biology/methods
9.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2690-2699, 2023.
Article in English | MEDLINE | ID: mdl-36374878

ABSTRACT

Transcription factors (TFs) play a part in gene expression. TFs can form complex gene expression regulation system by combining with DNA. Thereby, identifying the binding regions has become an indispensable step for understanding the regulatory mechanism of gene expression. Due to the great achievements of applying deep learning (DL) to computer vision and language processing in recent years, many scholars are inspired to use these methods to predict TF binding sites (TFBSs), achieving extraordinary results. However, these methods mainly focus on whether DNA sequences include TFBSs. In this paper, we propose a fully convolutional network (FCN) coupled with refinement residual block (RRB) and global average pooling layer (GAPL), namely FCNARRB. Our model could classify binding sequences at nucleotide level by outputting dense label for input data. Experimental results on human ChIP-seq datasets show that the RRB and GAPL structures are very useful for improving model performance. Adding GAPL improves the performance by 9.32% and 7.61% in terms of IoU (Intersection of Union) and PRAUC (Area Under Curve of Precision and Recall), and adding RRB improves the performance by 7.40% and 4.64%, respectively. In addition, we find that conservation information can help locate TFBSs.

10.
RSC Adv ; 12(41): 26953-26965, 2022 Sep 16.
Article in English | MEDLINE | ID: mdl-36320854

ABSTRACT

To improve the poor stability of nano zero-valent iron (nZVI), corn-straw biochar (BC) was used as a support for the synthesis of composites of nZVI-biochar (nZVI/BC) in different mass ratios. After a thorough characterization, the obtained nZVI/BC composite was used to remove hexavalent chromium [Cr(vi)] in an aquatic system under varying conditions including composite amount, Cr(vi) concentration, and pH. The obtained results show that the treatment efficiency varied in the following order: nZVI-BC (1 : 3) > nZVI-BC (1 : 5) > nZVI alone > BC alone. This order indicates the higher efficiency of composite material and the positive effect of nZVI content in the composite. Similarly, the composite dosage and Cr(vi) concentration had significant effects on the removal performance and 2 g L-1 and 6 g L-1 were considered to be the optimum dose at a Cr(vi) concentration of 20 mg L-1 and 100 mg L-1, respectively. The removal efficiency was maximum (100%) at pH 2 whereas solution pH increased significantly after the reaction (from 2 to 4.13). The removal kinetics of Cr(vi) was described by a pseudo-second-order model which indicated that the removal process was mainly controlled by the rate of chemical adsorption. The thermodynamics was more in line with the Freundlich model which indicated that the removal was multi-molecular layer adsorption. TEM-EDS, XRD, and XPS were applied to characterize the crystal lattice and structural changes of the material to specify the interfacial chemical behaviour on the agent surface. These techniques demonstrate that the underlying mechanisms of Cr(vi) removal include adsorption, chemical reduction-oxidation reaction, and co-precipitation on the surface of the nZVI-BC composite. The results indicated that the corn-straw BC as a carrier material highly improved Cr(vi) removal performance of nZVI and offered better utilization of the corn straw.

11.
PLoS Comput Biol ; 18(10): e1010572, 2022 10.
Article in English | MEDLINE | ID: mdl-36206320

ABSTRACT

In recent years, major advances have been made in various chromosome conformation capture technologies to further satisfy the needs of researchers for high-quality, high-resolution contact interactions. Discriminating the loops from genome-wide contact interactions is crucial for dissecting three-dimensional(3D) genome structure and function. Here, we present a deep learning method to predict genome-wide chromatin loops, called DLoopCaller, by combining accessible chromatin landscapes and raw Hi-C contact maps. Some available orthogonal data ChIA-PET/HiChIP and Capture Hi-C were used to generate positive samples with a wider contact matrix which provides the possibility to find more potential genome-wide chromatin loops. The experimental results demonstrate that DLoopCaller effectively improves the accuracy of predicting genome-wide chromatin loops compared to the state-of-the-art method Peakachu. Moreover, compared to two of most popular loop callers, such as HiCCUPS and Fit-Hi-C, DLoopCaller identifies some unique interactions. We conclude that a combination of chromatin landscapes on the one-dimensional genome contributes to understanding the 3D genome organization, and the identified chromatin loops reveal cell-type specificity and transcription factor motif co-enrichment across different cell lines and species.


Subject(s)
Chromatin , Deep Learning , Chromatin/genetics , Genome/genetics , Chromosomes , Transcription Factors/genetics
12.
BMC Genomics ; 23(1): 581, 2022 Aug 12.
Article in English | MEDLINE | ID: mdl-35962324

ABSTRACT

BACKGROUND: Circular RNAs (CircRNAs) play critical roles in gene expression regulation and disease development. Understanding the regulation mechanism of CircRNAs formation can help reveal the role of CircRNAs in various biological processes mentioned above. Back-splicing is important for CircRNAs formation. Back-splicing sites prediction helps uncover the mysteries of CircRNAs formation. Several methods were proposed for back-splicing sites prediction or circRNA-realted prediction tasks. Model performance was constrained by poor feature learning and using ability. RESULTS: In this study, CircCNN was proposed to predict pre-mRNA back-splicing sites. Convolution neural network and batch normalization are the main parts of CircCNN. Experimental results on three datasets show that CircCNN outperforms other baseline models. Moreover, PPM (Position Probability Matrix) features extract by CircCNN were converted as motifs. Further analysis reveals that some of motifs found by CircCNN match known motifs involved in gene expression regulation, the distribution of motif and special short sequence is important for pre-mRNA back-splicing. CONCLUSIONS: In general, the findings in this study provide a new direction for exploring CircRNA-related gene expression regulatory mechanism and identifying potential targets for complex malignant diseases. The datasets and source code of this study are freely available at: https://github.com/szhh521/CircCNN .


Subject(s)
RNA Precursors , RNA, Circular , Gene Expression Regulation , Neural Networks, Computer , RNA Precursors/metabolism , RNA Splicing
13.
Photosynth Res ; 153(3): 177-189, 2022 Sep.
Article in English | MEDLINE | ID: mdl-35834037

ABSTRACT

Iris tectorum Maxim. is an important plant that plays a very crucial role in the ecological welfare of wetlands. In this study, the effects of different intensities of UV-B radiation on the growth, photosynthetic pigment content, chlorophyll fluorescence characteristics, chloroplast ultrastructure, and gas exchange parameters of Iris tectorum Maxim. were studied. The results showed that enhanced UV-B radiation had a significant influence on the above-mentioned parameters of iris. Compared with the control, enhanced UV-B radiation caused certain damage to the leaf appearance. With the increasing intensity of radiation, the apparent damage degree became more serious. Enhanced UV-B radiation significantly decreased leaf chlorophyll contents, and the effect accumulated with the exposure time. Enhanced UV-B radiation increased Fo, significantly increased the non-photochemical quenching coefficient NPQ, reduced PSII and Qp, and significantly decreased the Fm, Fv/Fm, and Fv/Fo in leaves. The effect of UV-B radiation on PSII destruction of Iris tectorum Maxim. increased as the radiation intensity increased and the exposure time prolonged. The chloroplast structure was damaged under the enhanced UV-B radiation. More specifically, thylakoid lamellae were distorted, swelling and even blurred, and a large number of starch granules appeared. The effect of the high intensity of radiation on chloroplast ultrastructure was greater than that of lower intensity. Enhanced UV-B radiation reduced significantly the net photosynthetic rate, stomatal conductance, and transpiration rate, and the degree of degradation increased with the increasing irradiation intensity. However, the intercellular CO2 content increased, which suggests that the main reason for the decrease of photosynthetic rate was the non-stomatal factors.


Subject(s)
Iris Plant , Carbon Dioxide/metabolism , Chlorophyll/metabolism , Iris Plant/metabolism , Photosynthesis/physiology , Plant Leaves/physiology , Starch/metabolism
14.
RSC Adv ; 12(22): 13695-13705, 2022 May 05.
Article in English | MEDLINE | ID: mdl-35530389

ABSTRACT

In this study, raw attapulgite and two aluminium hydroxide-modified attapulgites prepared using different aluminium salts were calcined at 600 °C to successfully prepare three novel adsorbents (C-ATP, C-ATP-SO4 2- and C-ATP-Cl-). The three adsorbents were characterized by transmission electron microscopy (TEM), Fourier transform infrared spectroscopy (FTIR), X-ray diffraction (XRD), Brunauer-Emmett-Teller (BET) analysis and X-ray photoelectron spectroscopy (XPS). Batch experiments revealed that the Cd(ii) adsorption capacity of the three adsorbents increased with increasing pH, increasing the initial concentration of Cd(ii) in solution, and with longer adsorption times. The order of adsorption capacity was always C-ATP > C-ATP-Cl- > C-ATP-SO4 2-. C-ATP and C-ATP-Cl- were better described by the Langmuir model, while C-ATP-SO4 2- was better described by the Freundlich model. The three adsorbents reached adsorption equilibrium within 2 h, and all followed pseudo-second order kinetics. The adsorption of Cd(ii) onto the three adsorbents was physisorption, as suggested by the calculated thermodynamic parameters. Although the adsorption of Cd(ii) on C-ATP and C-ATP-Cl- was exothermic, the adsorption on C-ATP-SO4 2- was endothermic. Ion exchange and cadmium precipitation were the primary mechanisms of cadmium adsorption on the three adsorbents analysed by XPS. The presence of SO4 2- in C-ATP-SO4 2- may result in weaker binding of Cd(ii) by the adsorbent than C-ATP-Cl-.

15.
PLoS Comput Biol ; 18(3): e1009941, 2022 03.
Article in English | MEDLINE | ID: mdl-35263332

ABSTRACT

Transcription factors (TFs) play an important role in regulating gene expression, thus the identification of the sites bound by them has become a fundamental step for molecular and cellular biology. In this paper, we developed a deep learning framework leveraging existing fully convolutional neural networks (FCN) to predict TF-DNA binding signals at the base-resolution level (named as FCNsignal). The proposed FCNsignal can simultaneously achieve the following tasks: (i) modeling the base-resolution signals of binding regions; (ii) discriminating binding or non-binding regions; (iii) locating TF-DNA binding regions; (iv) predicting binding motifs. Besides, FCNsignal can also be used to predict opening regions across the whole genome. The experimental results on 53 TF ChIP-seq datasets and 6 chromatin accessibility ATAC-seq datasets show that our proposed framework outperforms some existing state-of-the-art methods. In addition, we explored to use the trained FCNsignal to locate all potential TF-DNA binding regions on a whole chromosome and predict DNA sequences of arbitrary length, and the results show that our framework can find most of the known binding regions and accept sequences of arbitrary length. Furthermore, we demonstrated the potential ability of our framework in discovering causal disease-associated single-nucleotide polymorphisms (SNPs) through a series of experiments.


Subject(s)
Deep Learning , Binding Sites , Chromatin Immunoprecipitation Sequencing , Protein Binding , Transcription Factors/metabolism
16.
Article in English | MEDLINE | ID: mdl-32750884

ABSTRACT

Attention mechanism has the ability to find important information in the sequence. The regions of the RNA sequence that can bind to proteins are more important than those that cannot bind to proteins. Neither conventional methods nor deep learning-based methods, they are not good at learning this information. In this study, LSTM is used to extract the correlation features between different sites in RNA sequence. We also use attention mechanism to evaluate the importance of different sites in RNA sequence. We get the optimal combination of k-mer length, k-mer stride window, k-mer sentence length, k-mer sentence stride window, and optimization function through hyper-parm experiments. The results show that the performance of our method is better than other methods. We tested the effects of changes in k-mer vector length on model performance. We show model performance changes under various k-mer related parameter settings. Furthermore, we investigate the effect of attention mechanism and RNA structure data on model performance.


Subject(s)
Deep Learning , Protein Binding , Proteins/chemistry , Proteins/genetics , RNA/genetics
17.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3144-3153, 2022.
Article in English | MEDLINE | ID: mdl-34882561

ABSTRACT

Discovery of transcription factor binding sites (TFBSs) is of primary importance for understanding the underlying binding mechanic and gene regulation process. Growing evidence indicates that apart from the primary DNA sequences, DNA shape landscape has a significant influence on transcription factor binding preference. To effectively model the co-influence of sequence and shape features, we emphasize the importance of position information of sequence motif and shape pattern. In this paper, we propose a novel deep learning-based architecture, named hybridShape eDeepCNN, for TFBS prediction which integrates DNA sequence and shape information in a spatially aligned manner. Our model utilizes the power of the multi-layer convolutional neural network and constructs an independent subnetwork to adapt for the distinct data distribution of heterogeneous features. Besides, we explore the usage of continuous embedding vectors as the representation of DNA sequences. Based on the experiments on 20 in-vitro datasets derived from universal protein binding microarrays (uPBMs), we demonstrate the superiority of our proposed method and validate the underlying design logic.


Subject(s)
DNA-Binding Proteins , Transcription Factors , Protein Binding , Transcription Factors/metabolism , Binding Sites/genetics , DNA-Binding Proteins/metabolism , DNA/chemistry
18.
IEEE J Biomed Health Inform ; 26(4): 1883-1890, 2022 04.
Article in English | MEDLINE | ID: mdl-34613923

ABSTRACT

Deciphering the relationship between transcription factors (TFs) and DNA sequences is very helpful for computational inference of gene regulation and a comprehensive understanding of gene regulation mechanisms. Transcription factor binding sites (TFBSs) are specific DNA short sequences that play a pivotal role in controlling gene expression through interaction with TF proteins. Although recently many computational and deep learning methods have been proposed to predict TFBSs aiming to predict sequence specificity of TF-DNA binding, there is still a lack of effective methods to directly locate TFBSs. In order to address this problem, we propose FCNGRU combing a fully convolutional neural network (FCN) with the gated recurrent unit (GRU) to directly locate TFBSs in this paper. Furthermore, we present a two-task framework (FCNGRU-double): one is a classification task at nucleotide level which predicts the probability of each nucleotide and locates TFBSs, and the other is a regression task at sequence level which predicts the intensity of each sequence. A series of experiments are conducted on 45 in-vitro datasets collected from the UniPROBE database derived from universal protein binding microarrays (uPBMs). Compared with competing methods, FCNGRU-double achieves much better results on these datasets. Moreover, FCNGRU-double has an advantage over a single-task framework, FCNGRU-single, which only contains the branch of locating TFBSs. In addition, we combine with in vivo datasets to make a further analysis and discussion.


Subject(s)
Computational Biology , Neural Networks, Computer , Binding Sites/genetics , Computational Biology/methods , DNA/chemistry , Humans , Nucleotides/metabolism , Protein Binding , Transcription Factors/genetics , Transcription Factors/metabolism
19.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3663-3672, 2022.
Article in English | MEDLINE | ID: mdl-34699364

ABSTRACT

The abuse of traditional antibiotics has led to an increase in the resistance of bacteria and viruses. Similar to the function of antibacterial peptides, bacteriocins are more common as a kind of peptides produced by bacteria that have bactericidal or bacterial effects. More importantly, the marine environment is one of the most abundant resources for extracting marine microbial bacteriocins (MMBs). Identifying bacteriocins from marine microorganisms is a common goal for the development of new drugs. Effective use of MMBs will greatly alleviate the current antibiotic abuse problem. In this work, deep learning is used to identify meaningful MMBs. We propose a random multi-scale convolutional neural network method. In the scale setting, we set a random model to update the scale value randomly. The scale selection method can reduce the contingency caused by artificial setting under certain conditions, thereby making the method more extensive. The results show that the classification performance of the proposed method is better than the state-of-the-art classification methods. In addition, some potential MMBs are predicted, and some different sequence analyses are performed on these candidates. It is worth mentioning that after sequence analysis, the HNH endonucleases of different marine bacteria are considered as potential bacteriocins.


Subject(s)
Bacteria , Bacteriocins , Drug Discovery , Neural Networks, Computer , Anti-Bacterial Agents/chemistry , Bacteria/chemistry , Bacteriocins/chemistry , Bacteriocins/classification , Peptides , Drug Discovery/methods , Aquatic Organisms/chemistry , Sequence Analysis, DNA
20.
Mol Ther Nucleic Acids ; 24: 154-163, 2021 Jun 04.
Article in English | MEDLINE | ID: mdl-33767912

ABSTRACT

The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. Recent research has shown that the double helix structure of nucleotides plays an important role in improving the accuracy and interpretability of transcription factor binding sites (TFBSs). Although several computational methods have been designed to take both DNA sequence and DNA shape features into consideration simultaneously, how to design an efficient model is still an intractable topic. In this paper, we proposed a hybrid convolutional recurrent neural network (CNN/RNN) architecture, CRPTS, to predict TFBSs by combining DNA sequence and DNA shape features. The novelty of our proposed method relies on three critical aspects: (1) the application of a shared hybrid CNN and RNN has the ability to efficiently extract features from large-scale genomic sequences obtained by high-throughput technology; (2) the common patterns were found from DNA sequences and their corresponding DNA shape features; (3) our proposed CRPTS can capture local structural information of DNA sequences without completely relying on DNA shape data. A series of comprehensive experiments on 66 in vitro datasets derived from universal protein binding microarrays (uPBMs) shows that our proposed method CRPTS obviously outperforms the state-of-the-art methods.

SELECTION OF CITATIONS
SEARCH DETAIL
...