Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
BMC Bioinformatics ; 24(1): 481, 2023 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-38104057

RESUMEN

BACKGROUND: The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train. RESULTS: To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space. CONCLUSIONS: The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.


Asunto(s)
Leucocitos Mononucleares , Análisis de Expresión Génica de una Sola Célula , Análisis de Secuencia de ARN/métodos , Leucocitos Mononucleares/metabolismo , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados
2.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33498086

RESUMEN

Transcription factors (TFs) play an important role in regulating gene expression, thus identification of the regions bound by them has become a fundamental step for molecular and cellular biology. In recent years, an increasing number of deep learning (DL) based methods have been proposed for predicting TF binding sites (TFBSs) and achieved impressive prediction performance. However, these methods mainly focus on predicting the sequence specificity of TF-DNA binding, which is equivalent to a sequence-level binary classification task, and fail to identify motifs and TFBSs accurately. In this paper, we developed a fully convolutional network coupled with global average pooling (FCNA), which by contrast is equivalent to a nucleotide-level binary classification task, to roughly locate TFBSs and accurately identify motifs. Experimental results on human ChIP-seq datasets show that FCNA outperforms other competing methods significantly. Besides, we find that the regions located by FCNA can be used by motif discovery tools to further refine the prediction performance. Furthermore, we observe that FCNA can accurately identify TF-DNA binding motifs across different cell lines and infer indirect TF-DNA bindings.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Redes Neurales de la Computación , Elementos de Respuesta , Análisis de Secuencia de ADN , Análisis de Secuencia de Proteína , Factores de Transcripción , Células A549 , Secuencias de Aminoácidos , Humanos , Células MCF-7 , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
3.
Brief Bioinform ; 22(2): 2085-2095, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-32232320

RESUMEN

Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree structure into a relationship network and applied several graph embedding algorithms on it to represent these terms. Specifically, the relationship network consisting of nodes (MeSH headings) and edges (relationships), which can be constructed by the tree num. Then, five graph embedding algorithms including DeepWalk, LINE, SDNE, LAP and HOPE were implemented on the relationship network to represent MeSH headings as vectors. In order to evaluate the performance of the proposed methods, we carried out the node classification and relationship prediction tasks. The results show that the MeSH headings characterized by graph embedding algorithms can not only be treated as an independent carrier for representation, but also can be utilized as additional information to enhance the representation ability of vectors. Thus, it can serve as an input and continue to play a significant role in any computational models related to disease, drug, microbe, etc. Besides, our method holds great hope to inspire relevant researchers to study the representation of terms in this network perspective.


Asunto(s)
Algoritmos , Medical Subject Headings , Simulación por Computador , Sistemas de Liberación de Medicamentos , Predisposición Genética a la Enfermedad , Humanos , MicroARNs/genética , Semántica
4.
PLoS Comput Biol ; 18(3): e1009941, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35263332

RESUMEN

Transcription factors (TFs) play an important role in regulating gene expression, thus the identification of the sites bound by them has become a fundamental step for molecular and cellular biology. In this paper, we developed a deep learning framework leveraging existing fully convolutional neural networks (FCN) to predict TF-DNA binding signals at the base-resolution level (named as FCNsignal). The proposed FCNsignal can simultaneously achieve the following tasks: (i) modeling the base-resolution signals of binding regions; (ii) discriminating binding or non-binding regions; (iii) locating TF-DNA binding regions; (iv) predicting binding motifs. Besides, FCNsignal can also be used to predict opening regions across the whole genome. The experimental results on 53 TF ChIP-seq datasets and 6 chromatin accessibility ATAC-seq datasets show that our proposed framework outperforms some existing state-of-the-art methods. In addition, we explored to use the trained FCNsignal to locate all potential TF-DNA binding regions on a whole chromosome and predict DNA sequences of arbitrary length, and the results show that our framework can find most of the known binding regions and accept sequences of arbitrary length. Furthermore, we demonstrated the potential ability of our framework in discovering causal disease-associated single-nucleotide polymorphisms (SNPs) through a series of experiments.


Asunto(s)
Aprendizaje Profundo , Sitios de Unión , Secuenciación de Inmunoprecipitación de Cromatina , Unión Proteica , Factores de Transcripción/metabolismo
5.
BMC Bioinformatics ; 22(Suppl 5): 622, 2022 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-35317723

RESUMEN

BACKGROUND: lncRNAs play a critical role in numerous biological processes and life activities, especially diseases. Considering that traditional wet experiments for identifying uncovered lncRNA-disease associations is limited in terms of time consumption and labor cost. It is imperative to construct reliable and efficient computational models as addition for practice. Deep learning technologies have been proved to make impressive contributions in many areas, but the feasibility of it in bioinformatics has not been adequately verified. RESULTS: In this paper, a machine learning-based model called LDACE was proposed to predict potential lncRNA-disease associations by combining Extreme Learning Machine (ELM) and Convolutional Neural Network (CNN). Specifically, the representation vectors are constructed by integrating multiple types of biology information including functional similarity and semantic similarity. Then, CNN is applied to mine both local and global features. Finally, ELM is chosen to carry out the prediction task to detect the potential lncRNA-disease associations. The proposed method achieved remarkable Area Under Receiver Operating Characteristic Curve of 0.9086 in Leave-one-out cross-validation and 0.8994 in fivefold cross-validation, respectively. In addition, 2 kinds of case studies based on lung cancer and endometrial cancer indicate the robustness and efficiency of LDACE even in a real environment. CONCLUSIONS: Substantial results demonstrated that the proposed model is expected to be an auxiliary tool to guide and assist biomedical research, and the close integration of deep learning and biology big data will provide life sciences with novel insights.


Asunto(s)
ARN Largo no Codificante , Biología Computacional/métodos , Aprendizaje Automático , Redes Neurales de la Computación , ARN Largo no Codificante/genética , Curva ROC
6.
BMC Med Inform Decis Mak ; 21(Suppl 1): 308, 2021 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-34736437

RESUMEN

BACKGROUND: Disease-drug associations provide essential information for drug discovery and disease treatment. Many disease-drug associations remain unobserved or unknown, and trials to confirm these associations are time-consuming and expensive. To better understand and explore these valuable associations, it would be useful to develop computational methods for predicting unobserved disease-drug associations. With the advent of various datasets describing diseases and drugs, it has become more feasible to build a model describing the potential correlation between disease and drugs. RESULTS: In this work, we propose a new prediction method, called LMFDA, which works in several stages. First, it studies the drug chemical structure, disease MeSH descriptors, disease-related phenotypic terms, and drug-drug interactions. On this basis, similarity networks of different sources are constructed to enrich the representation of drugs and diseases. Based on the fused disease similarity network and drug similarity network, LMFDA calculated the association score of each pair of diseases and drugs in the database. This method achieves good performance on Fdataset and Cdataset, AUROCs were 91.6% and 92.1% respectively, higher than many of the existing computational models. CONCLUSIONS: The novelty of LMFDA lies in the introduction of multimodal fusion using low-rank tensors to fuse multiple similar networks and combine matrix complement technology to predict potential association. We have demonstrated that LMFDA can display excellent network integration ability for accurate disease-drug association inferring and achieve substantial improvement over the advanced approach. Overall, experimental results on two real-world networks dataset demonstrate that LMFDA able to delivers an excellent detecting performance. Results also suggest that perfecting similar networks with as much domain knowledge as possible is a promising direction for drug repositioning.


Asunto(s)
Biología Computacional , Preparaciones Farmacéuticas , Algoritmos , Bases de Datos Factuales , Descubrimiento de Drogas , Reposicionamiento de Medicamentos
7.
BMC Bioinformatics ; 21(1): 401, 2020 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-32912137

RESUMEN

BACKGROUND: As an important non-coding RNA, microRNA (miRNA) plays a significant role in a series of life processes and is closely associated with a variety of Human diseases. Hence, identification of potential miRNA-disease associations can make great contributions to the research and treatment of Human diseases. However, to our knowledge, many existing computational methods only utilize the single type of known association information between miRNAs and diseases to predict their potential associations, without focusing on their interactions or associations with other types of molecules. RESULTS: In this paper, we propose a network embedding-based method for predicting miRNA-disease associations by preserving behavior and attribute information. Firstly, a heterogeneous network is constructed by integrating known associations among miRNA, protein and disease, and the network representation method Learning Graph Representations with Global Structural Information (GraRep) is implemented to learn the behavior information of miRNAs and diseases in the network. Then, the behavior information of miRNAs and diseases is combined with the attribute information of them to represent miRNA-disease association pairs. Finally, the prediction model is established based on the Random Forest algorithm. Under the five-fold cross validation, the proposed NEMPD model obtained average 85.41% prediction accuracy with 80.96% sensitivity at the AUC of 91.58%. Furthermore, the performance of NEMPD is also validated by the case studies. Among the top 50 predicted disease-related miRNAs, 48 (breast neoplasms), 47 (colon neoplasms), 47 (lung neoplasms) were confirmed by two other databases. CONCLUSIONS: The proposed NEMPD model has a good performance in predicting the potential associations between miRNAs and diseases, and has great potency in the field of miRNA-disease association prediction in the future.


Asunto(s)
Neoplasias de la Mama/diagnóstico , Neoplasias del Colon/diagnóstico , Biología Computacional/métodos , Neoplasias Pulmonares/diagnóstico , MicroARNs/metabolismo , Algoritmos , Área Bajo la Curva , Neoplasias de la Mama/genética , Neoplasias del Colon/genética , Femenino , Humanos , Neoplasias Pulmonares/genética , MicroARNs/genética , Curva ROC
8.
J Cell Mol Med ; 24(1): 79-87, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31568653

RESUMEN

LncRNA and miRNA are key molecules in mechanism of competing endogenous RNAs(ceRNA), and their interactions have been discovered with important roles in gene regulation. As supplementary to the identification of lncRNA-miRNA interactions from CLIP-seq experiments, in silico prediction can select the most potential candidates for experimental validation. Although developing computational tool for predicting lncRNA-miRNA interaction is of great importance for deciphering the ceRNA mechanism, little effort has been made towards this direction. In this paper, we propose an approach based on linear neighbour representation to predict lncRNA-miRNA interactions (LNRLMI). Specifically, we first constructed a bipartite network by combining the known interaction network and similarities based on expression profiles of lncRNAs and miRNAs. Based on such a data integration, linear neighbour representation method was introduced to construct a prediction model. To evaluate the prediction performance of the proposed model, k-fold cross validations were implemented. As a result, LNRLMI yielded the average AUCs of 0.8475 ± 0.0032, 0.8960 ± 0.0015 and 0.9069 ± 0.0014 on 2-fold, 5-fold and 10-fold cross validation, respectively. A series of comparison experiments with other methods were also conducted, and the results showed that our method was feasible and effective to predict lncRNA-miRNA interactions via a combination of different types of useful side information. It is anticipated that LNRLMI could be a useful tool for predicting non-coding RNA regulation network that lncRNA and miRNA are involved in.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Regulación de la Expresión Génica , Redes Reguladoras de Genes , MicroARNs/metabolismo , ARN Largo no Codificante/metabolismo , ARN Mensajero/metabolismo , Área Bajo la Curva , Perfilación de la Expresión Génica , Humanos , MicroARNs/genética , ARN Largo no Codificante/genética , ARN Mensajero/genética
9.
BMC Med Inform Decis Mak ; 20(Suppl 2): 49, 2020 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-32183788

RESUMEN

BACKGROUND: The key to modern drug discovery is to find, identify and prepare drug molecular targets. However, due to the influence of throughput, precision and cost, traditional experimental methods are difficult to be widely used to infer these potential Drug-Target Interactions (DTIs). Therefore, it is urgent to develop effective computational methods to validate the interaction between drugs and target. METHODS: We developed a deep learning-based model for DTIs prediction. The proteins evolutionary features are extracted via Position Specific Scoring Matrix (PSSM) and Legendre Moment (LM) and associated with drugs molecular substructure fingerprints to form feature vectors of drug-target pairs. Then we utilized the Sparse Principal Component Analysis (SPCA) to compress the features of drugs and proteins into a uniform vector space. Lastly, the deep long short-term memory (DeepLSTM) was constructed for carrying out prediction. RESULTS: A significant improvement in DTIs prediction performance can be observed on experimental results, with AUC of 0.9951, 0.9705, 0.9951, 0.9206, respectively, on four classes important drug-target datasets. Further experiments preliminary proves that the proposed characterization scheme has great advantage on feature expression and recognition. We also have shown that the proposed method can work well with small dataset. CONCLUSION: The results demonstration that the proposed approach has a great advantage over state-of-the-art drug-target predictor. To the best of our knowledge, this study first tests the potential of deep learning method with memory and Turing completeness in DTIs prediction.


Asunto(s)
Aprendizaje Profundo , Memoria a Corto Plazo/efectos de los fármacos , Redes Neurales de la Computación , Preparaciones Farmacéuticas , Desarrollo de Medicamentos , Humanos , Análisis de Componente Principal , Proteínas
10.
BMC Genomics ; 20(Suppl 13): 928, 2019 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-31881833

RESUMEN

BACKGROUND: Identification of protein-protein interactions (PPIs) is crucial for understanding biological processes and investigating the cellular functions of genes. Self-interacting proteins (SIPs) are those in which more than two identical proteins can interact with each other and they are the specific type of PPIs. More and more researchers draw attention to the SIPs detection, and several prediction model have been proposed, but there are still some problems. Hence, there is an urgent need to explore a efficient computational model for SIPs prediction. RESULTS: In this study, we developed an effective model to predict SIPs, called RP-FIRF, which merges the Random Projection (RP) classifier and Finite Impulse Response Filter (FIRF) together. More specifically, each protein sequence was firstly transformed into the Position Specific Scoring Matrix (PSSM) by exploiting Position Specific Iterated BLAST (PSI-BLAST). Then, to effectively extract the discriminary SIPs feature to improve the performance of SIPs prediction, a FIRF method was used on PSSM. The R'classifier was proposed to execute the classification and predict novel SIPs. We evaluated the performance of the proposed RP-FIRF model and compared it with the state-of-the-art support vector machine (SVM) on human and yeast datasets, respectively. The proposed model can achieve high average accuracies of 97.89 and 97.35% using five-fold cross-validation. To further evaluate the high performance of the proposed method, we also compared it with other six exiting methods, the experimental results demonstrated that the capacity of our model surpass that of the other previous approaches. CONCLUSION: Experimental results show that self-interacting proteins are accurately well-predicted by the proposed model on human and yeast datasets, respectively. It fully show that the proposed model can predict the SIPs effectively and sufficiently. Thus, RP-FIRF model is an automatic decision support method which should provide useful insights into the recognition of SIPs.


Asunto(s)
Proteínas/metabolismo , Máquina de Vectores de Soporte , Área Bajo la Curva , Bases de Datos de Proteínas , Humanos , Análisis de Componente Principal , Mapas de Interacción de Proteínas , Proteínas/química , Curva ROC , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo
11.
Int J Mol Sci ; 20(4)2019 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-30795499

RESUMEN

It is significant for biological cells to predict self-interacting proteins (SIPs) in the field of bioinformatics. SIPs mean that two or more identical proteins can interact with each other by one gene expression. This plays a major role in the evolution of protein‒protein interactions (PPIs) and cellular functions. Owing to the limitation of the experimental identification of self-interacting proteins, it is more and more significant to develop a useful biological tool for the prediction of SIPs from protein sequence information. Therefore, we propose a novel prediction model called RP-FFT that merges the Random Projection (RP) model and Fast Fourier Transform (FFT) for detecting SIPs. First, each protein sequence was transformed into a Position Specific Scoring Matrix (PSSM) using the Position Specific Iterated BLAST (PSI-BLAST). Second, the features of protein sequences were extracted by the FFT method on PSSM. Lastly, we evaluated the performance of RP-FFT and compared the RP classifier with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the human and yeast datasets; after the five-fold cross-validation, the RP-FFT model can obtain high average accuracies of 96.28% and 91.87% on the human and yeast datasets, respectively. The experimental results demonstrated that our RP-FFT prediction model is reasonable and robust.


Asunto(s)
Análisis de Fourier , Análisis de Secuencia de Proteína/métodos , Máquina de Vectores de Soporte , Animales , Sitios de Unión , Humanos , Unión Proteica , Proteínas de Saccharomyces cerevisiae/química
12.
Math Biosci Eng ; 21(3): 4440-4462, 2024 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-38549335

RESUMEN

This paper investigates the prescribed-time event-triggered cluster practical consensus problem for a class of nonlinear multi-agent systems with external disturbances. To begin, to reach the prescribed-time cluster practical consensus, a new time-varying function is introduced and a novel distributed continuous algorithm is designed. Based on the Lyapunov stability theory and inequality techniques, some sufficient conditions are given, ensuring the prescribed-time cluster practical consensus. Moreover, to avoid different clusters' final states overlapping, a virtual leader is considered for each cluster. In this case, an event-triggered distributed protocol is further established and some related conditions are given for achieving prescribed-time cluster practical consensus. Additionally, it is proven that the Zeno behavior can be avioded by choosing parameters appropriately. Finally, some numerical examples are presented to show the effectiveness of the theoretical results.

13.
Artículo en Inglés | MEDLINE | ID: mdl-35389869

RESUMEN

DNA-binding proteins (DBPs) play vital roles in the regulation of biological systems. Although there are already many deep learning methods for predicting the sequence specificities of DBPs, they face two challenges as follows. Classic deep learning methods for DBPs prediction usually fail to capture the dependencies between genomic sequences since their commonly used one-hot codes are mutually orthogonal. Besides, these methods usually perform poorly when samples are inadequate. To address these two challenges, we developed a novel language model for mining DBPs using human genomic data and ChIP-seq datasets with decaying learning rates, named DNA Fine-tuned Language Model (DFLM). It can capture the dependencies between genome sequences based on the context of human genomic data and then fine-tune the features of DBPs tasks using different ChIP-seq datasets. First, we compared DFLM with the existing widely used methods on 69 datasets and we achieved excellent performance. Moreover, we conducted comparative experiments on complex DBPs and small datasets. The results show that DFLM still achieved a significant improvement. Finally, through visualization analysis of one-hot encoding and DFLM, we found that one-hot encoding completely cut off the dependencies of DNA sequences themselves, while DFLM using language models can well represent the dependency of DNA sequences. Source code are available at: https://github.com/Deep-Bioinfo/DFLM.


Asunto(s)
Algoritmos , Proteínas de Unión al ADN , Humanos , Genómica , ADN/genética , Genoma
14.
IEEE J Biomed Health Inform ; 27(9): 4611-4622, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37368803

RESUMEN

The abuse of traditional antibiotics has led to increased resistance of bacteria and viruses. Efficient therapeutic peptide prediction is critical for peptide drug discovery. However, most of the existing methods only make effective predictions for one class of therapeutic peptides. It is worth noting that currently no predictive method considers sequence length information as a distinct feature of therapeutic peptides. In this article, a novel deep learning approach with matrix factorization for predicting therapeutic peptides (DeepTPpred) by integrating length information are proposed. The matrix factorization layer can learn the potential features of the encoded sequence through the mechanism of first compression and then restoration. And the length features of the sequence of therapeutic peptides are embedded with encoded amino acid sequences. To automatically learn therapeutic peptide predictions, these latent features are input into the neural networks with self-attention mechanism. On eight therapeutic peptide datasets, DeepTPpred achieved excellent prediction results. Based on these datasets, we first integrated eight datasets to obtain a full therapeutic peptide integration dataset. Then, we obtained two functional integration datasets based on the functional similarity of the peptides. Finally, we also conduct experiments on the latest versions of the ACP and CPP datasets. Overall, the experimental results show that our work is effective for the identification of therapeutic peptides.


Asunto(s)
Aprendizaje Profundo , Humanos , Péptidos/química , Redes Neurales de la Computación , Descubrimiento de Drogas
15.
Mol Ther Nucleic Acids ; 32: 721-728, 2023 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-37251691

RESUMEN

Identifying proteins that interact with drug compounds has been recognized as an important part in the process of drug discovery. Despite extensive efforts that have been invested in predicting compound-protein interactions (CPIs), existing traditional methods still face several challenges. The computer-aided methods can identify high-quality CPI candidates instantaneously. In this research, a novel model is named GraphCPIs, proposed to improve the CPI prediction accuracy. First, we establish the adjacent matrix of entities connected to both drugs and proteins from the collected dataset. Then, the feature representation of nodes could be obtained by using the graph convolutional network and Grarep embedding model. Finally, an extreme gradient boosting (XGBoost) classifier is exploited to identify potential CPIs based on the stacked two kinds of features. The results demonstrate that GraphCPIs achieves the best performance, whose average predictive accuracy rate reaches 90.09%, average area under the receiver operating characteristic curve is 0.9572, and the average area under the precision and recall curve is 0.9621. Moreover, comparative experiments reveal that our method surpasses the state-of-the-art approaches in the field of accuracy and other indicators with the same experimental environment. We believe that the GraphCPIs model will provide valuable insight to discover novel candidate drug-related proteins.

16.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2610-2618, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-35675235

RESUMEN

Accumulating evidences show that circular RNAs (circRNAs) play an important role in regulating gene expression, and involve in many complex human diseases. Identifying associations of circRNA with disease helps to understand the pathogenesis, treatment and diagnosis of complex diseases. Since inferring circRNA-disease associations by biological experiments is costly and time-consuming, there is an urgently need to develop a computational model to identify the association between them. In this paper, we proposed a novel method named KNN-NMF, which combines K nearest neighbors with nonnegative matrix factorization to infer associations between circRNA and disease (KNN-NMF). Frist, we compute the Gaussian Interaction Profile (GIP) kernel similarity of circRNA and disease, the semantic similarity of disease, respectively. Then, the circRNA-disease new interaction profiles are established using weight K nearest neighbors to reduce the false negative association impact on prediction performance. Finally, Nonnegative Matrix Factorization is implemented to predict associations of circRNA with disease. The experiment results indicate that the prediction performance of KNN-NMF outperforms the competing methods under five-fold cross-validation. Moreover, case studies of two common diseases further show that KNN-NMF can identify potential circRNA-disease associations effectively.

17.
IEEE J Biomed Health Inform ; 26(4): 1883-1890, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-34613923

RESUMEN

Deciphering the relationship between transcription factors (TFs) and DNA sequences is very helpful for computational inference of gene regulation and a comprehensive understanding of gene regulation mechanisms. Transcription factor binding sites (TFBSs) are specific DNA short sequences that play a pivotal role in controlling gene expression through interaction with TF proteins. Although recently many computational and deep learning methods have been proposed to predict TFBSs aiming to predict sequence specificity of TF-DNA binding, there is still a lack of effective methods to directly locate TFBSs. In order to address this problem, we propose FCNGRU combing a fully convolutional neural network (FCN) with the gated recurrent unit (GRU) to directly locate TFBSs in this paper. Furthermore, we present a two-task framework (FCNGRU-double): one is a classification task at nucleotide level which predicts the probability of each nucleotide and locates TFBSs, and the other is a regression task at sequence level which predicts the intensity of each sequence. A series of experiments are conducted on 45 in-vitro datasets collected from the UniPROBE database derived from universal protein binding microarrays (uPBMs). Compared with competing methods, FCNGRU-double achieves much better results on these datasets. Moreover, FCNGRU-double has an advantage over a single-task framework, FCNGRU-single, which only contains the branch of locating TFBSs. In addition, we combine with in vivo datasets to make a further analysis and discussion.


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Sitios de Unión/genética , Biología Computacional/métodos , ADN/química , Humanos , Nucleótidos/metabolismo , Unión Proteica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
18.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3663-3672, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34699364

RESUMEN

The abuse of traditional antibiotics has led to an increase in the resistance of bacteria and viruses. Similar to the function of antibacterial peptides, bacteriocins are more common as a kind of peptides produced by bacteria that have bactericidal or bacterial effects. More importantly, the marine environment is one of the most abundant resources for extracting marine microbial bacteriocins (MMBs). Identifying bacteriocins from marine microorganisms is a common goal for the development of new drugs. Effective use of MMBs will greatly alleviate the current antibiotic abuse problem. In this work, deep learning is used to identify meaningful MMBs. We propose a random multi-scale convolutional neural network method. In the scale setting, we set a random model to update the scale value randomly. The scale selection method can reduce the contingency caused by artificial setting under certain conditions, thereby making the method more extensive. The results show that the classification performance of the proposed method is better than the state-of-the-art classification methods. In addition, some potential MMBs are predicted, and some different sequence analyses are performed on these candidates. It is worth mentioning that after sequence analysis, the HNH endonucleases of different marine bacteria are considered as potential bacteriocins.


Asunto(s)
Bacterias , Bacteriocinas , Descubrimiento de Drogas , Redes Neurales de la Computación , Antibacterianos/química , Bacterias/química , Bacteriocinas/química , Bacteriocinas/clasificación , Péptidos , Descubrimiento de Drogas/métodos , Organismos Acuáticos/química , Análisis de Secuencia de ADN
19.
Biology (Basel) ; 11(5)2022 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-35625468

RESUMEN

The key to new drug discovery and development is first and foremost the search for molecular targets of drugs, thus advancing drug discovery and drug repositioning. However, traditional drug-target interactions (DTIs) is a costly, lengthy, high-risk, and low-success-rate system project. Therefore, more and more pharmaceutical companies are trying to use computational technologies to screen existing drug molecules and mine new drugs, leading to accelerating new drug development. In the current study, we designed a deep learning computational model MSPEDTI based on Molecular Structure and Protein Evolutionary to predict the potential DTIs. The model first fuses protein evolutionary information and drug structure information, then a deep learning convolutional neural network (CNN) to mine its hidden features, and finally accurately predicts the associated DTIs by extreme learning machine (ELM). In cross-validation experiments, MSPEDTI achieved 94.19%, 90.95%, 87.95%, and 86.11% prediction accuracy in the gold-standard datasets enzymes, ion channels, G-protein-coupled receptors (GPCRs), and nuclear receptors, respectively. MSPEDTI showed its competitive ability in ablation experiments and comparison with previous excellent methods. Additionally, 7 of 10 potential DTIs predicted by MSPEDTI were substantiated by the classical database. These excellent outcomes demonstrate the ability of MSPEDTI to provide reliable drug candidate targets and strongly facilitate the development of drug repositioning and drug development.

20.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3144-3153, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34882561

RESUMEN

Discovery of transcription factor binding sites (TFBSs) is of primary importance for understanding the underlying binding mechanic and gene regulation process. Growing evidence indicates that apart from the primary DNA sequences, DNA shape landscape has a significant influence on transcription factor binding preference. To effectively model the co-influence of sequence and shape features, we emphasize the importance of position information of sequence motif and shape pattern. In this paper, we propose a novel deep learning-based architecture, named hybridShape eDeepCNN, for TFBS prediction which integrates DNA sequence and shape information in a spatially aligned manner. Our model utilizes the power of the multi-layer convolutional neural network and constructs an independent subnetwork to adapt for the distinct data distribution of heterogeneous features. Besides, we explore the usage of continuous embedding vectors as the representation of DNA sequences. Based on the experiments on 20 in-vitro datasets derived from universal protein binding microarrays (uPBMs), we demonstrate the superiority of our proposed method and validate the underlying design logic.


Asunto(s)
Proteínas de Unión al ADN , Factores de Transcripción , Unión Proteica , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Proteínas de Unión al ADN/metabolismo , ADN/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA