Search | VHL Regional Portal

1.

scVSC: Deep variational subspace clustering for single-cell transcriptome data.

Wang, Zile; Wang, Haiyun; Zhao, Jianping; Xia, Junfeng; Zheng, Chunhou.

IEEE/ACM Trans Comput Biol Bioinform ; PP2024 May 27.

Article in English | MEDLINE | ID: mdl-38801694

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) is a potent advancement for analyzing gene expression at the individual cell level, allowing for the identification of cellular heterogeneity and subpopulations. However, it suffers from technical limitations that result in sparse and heterogeneous data. Here, we propose scVSC, an unsupervised clustering algorithm built on deep representation neural networks. The method incorporates the variational inference into the subspace model, which imposes regularization constraints on the latent space and further prevents overfitting. In a series of experiments across multiple datasets, scVSC outperforms existing state-of-the-art unsupervised and semi-supervised clustering tools regarding clustering accuracy and running efficiency. Moreover, the study indicates that scVSC could visually reveal the state of trajectory differentiation, accurately identify differentially expressed genes, and further discover biologically critical pathways.

2.

A Novel Skip-Connection Strategy by Fusing Spatial and Channel Wise Features for Multi-Region Medical Image Segmentation.

Tan, Dayu; Hao, Rui; Zhou, Xiaoping; Xia, Junfeng; Su, Yansen; Zheng, Chunhou.

IEEE J Biomed Health Inform ; PP2024 May 29.

Article in English | MEDLINE | ID: mdl-38809722

ABSTRACT

Recent methods often introduce attention mechanisms into the skip connections of U-shaped networks to capture features. However, these methods usually overlook spatial information extraction in skip connections and exhibit inefficiency in capturing spatial and channel information. This issue prompts us to reevaluate the design of the skip-connection mechanism and propose a new deep-learning network called the Fusing Spatial and Channel Attention Network, abbreviated as FSCA-Net. FSCA-Net is a novel U-shaped network architecture that utilizes the Parallel Attention Transformer (PAT) to enhance the extraction of spatial and channel features in the skip-connection mechanism, further compensating for downsampling losses. We design the Cross-Attention Bridge Layer (CAB) to mitigate excessive feature and resolution loss when downsampling to the lowest level, ensuring meaningful information fusion during upsampling at the lowest level. Finally, we construct the Dual-Path Channel Attention (DPCA) module to guide channel and spatial information filtering for Transformer features, eliminating ambiguities with decoder features and better concatenating features with semantic inconsistencies between the Transformer and the U-Net decoder. FSCA-Net is designed explicitly for fine-grained segmentation tasks of multiple organs and regions. Our approach achieves over 48% reduction in FLOPs and over 32% reduction in parameters compared to the state-of-the-art method. Moreover, FSCA-Net outperforms existing segmentation methods on seven public datasets, demonstrating exceptional performance. The code has been made available on GitHub: https://github.com/Henry991115/FSCA-Net.

3.

DeepFGRN: inference of gene regulatory network with regulation type based on directed graph embedding.

Gao, Zhen; Su, Yansen; Xia, Junfeng; Cao, Rui-Fen; Ding, Yun; Zheng, Chun-Hou; Wei, Pi-Jing.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38581416

ABSTRACT

The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.

Subject(s)

Gene Regulatory Networks , Liver Neoplasms , Humans , Systems Biology/methods , Transcriptome , Algorithms , Computational Biology/methods

4.

dbCRAF: a curated knowledgebase for regulation of radiation response in human cancer.

Liu, Jie; Li, Jing; Jin, Fangfang; Li, Qian; Zhao, Guoping; Wu, Lijun; Li, Xiaoyan; Xia, Junfeng; Cheng, Na.

NAR Cancer ; 6(1): zcae008, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38406264

ABSTRACT

Radiation therapy (RT) is one of the primary treatment modalities of cancer, with 40-60% of cancer patients benefiting from RT during their treatment course. The intrinsic radiosensitivity or acquired radioresistance of tumor cells would affect the response to RT and clinical outcomes in patients. Thus, mining the regulatory mechanisms in tumor radiosensitivity or radioresistance that have been verified by biological experiments and computational analysis methods will enhance the overall understanding of RT. Here, we describe a comprehensive database dbCRAF (http://dbCRAF.xialab.info/) to document and annotate the factors (1,677 genes, 49 proteins and 612 radiosensitizers) linked with radiation response, including radiosensitivity, radioresistance in cancer cells and prognosis in cancer patients receiving RT. On the one hand, dbCRAF enables researchers to directly access knowledge for regulation of radiation response in human cancer buried in the vast literature. On the other hand, dbCRAF provides four flexible modules to analyze and visualize the functional relationship between these factors and clinical outcome, KEGG pathway and target genes. In conclusion, dbCRAF serves as a valuable resource for elucidating the regulatory mechanisms of radiation response in human cancers as well as for the improvement of RT options.

5.

Effect Predictor of Driver Synonymous Mutations Based on Multi-Feature Fusion and Iterative Feature Representation Learning.

Cheng, Na; Bi, Chuanmei; Shi, Yong; Liu, Mengya; Cao, Anqi; Ren, Mengkun; Xia, Junfeng; Liang, Zhen.

IEEE J Biomed Health Inform ; 28(2): 1144-1151, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38096097

ABSTRACT

Accurate identification of driver mutations is crucial in genetic studies of human cancers. While numerous cancer driver missense mutations have been identified, research into potential cancer drivers for synonymous mutations has shown limited success to date. Here, we developed a novel machine learning framework, epSMic, for predicting cancer driver synonymous mutations. epSMic employs an iterative feature representation scheme that facilitates the learning of discriminative features from various sequential models in a supervised iterative mode. We constructed the benchmark datasets and encoded the embedding sequence, physicochemical property, and basic information such as conservation and splicing feature. The evaluation results on benchmark test datasets demonstrate that epSMic outperforms existing methods, making it a valuable tool for researchers in identifying functional synonymous mutations in cancer. We hope epSMic can enable researchers to concentrate on synonymous mutations that have a functional impact on cancer.

Subject(s)

Neoplasms , Silent Mutation , Humans , Neoplasms/genetics , Machine Learning

6.

MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy.

Wang, Gang-Ao; Yan, Xiaodi; Li, Xiang; Liu, Yinbo; Xia, Junfeng; Zhu, Xiaolei.

ACS Omega ; 8(44): 41930-41942, 2023 Nov 07.

Article in English | MEDLINE | ID: mdl-37969991

ABSTRACT

As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace.

7.

Identification of region-specific splicing QTLs in human hippocampal tissue and its distinctive role in brain disorders.

Li, Xiaoyan; Zhao, Yiran; Kong, Hui; Song, Chengcheng; Liu, Jie; Xia, Junfeng.

iScience ; 26(10): 107958, 2023 Oct 20.

Article in English | MEDLINE | ID: mdl-37810239

ABSTRACT

Alternative splicing (AS) regulation has an essential role in complex diseases. However, the AS profiles in the hippocampal (HIPPO) region of human brain are underexplored. Here, we investigated cis-acting sQTLs of HIPPO region in 264 samples and identified thousands of significant sQTLs. By enrichment analysis and functional characterization of these sQTLs, we found that the HIPPO sQTLs were enriched among histone-marked regions, transcription factors binding sites, RNA binding proteins sites, and brain disorders-associated loci. Comparative analyses with the dorsolateral prefrontal cortex revealed the importance of AS regulation in HIPPO (rg = 0.87). Furthermore, we performed a transcriptome-wide association study of Alzheimer's disease and identified 16 significant genes whose genetically regulated splicing levels may have a causal role in Alzheimer. Overall, our study improves our knowledge of the transcriptome gene regulation in the HIPPO region and provides novel insights into elucidating the pathogenesis of potential genes associated with brain disorders.

8.

Mendelian Randomization Using the Druggable Genome Reveals Genetically Supported Drug Targets for Psychiatric Disorders.

Li, Xiaoyan; Shen, Aotian; Zhao, Yiran; Xia, Junfeng.

Schizophr Bull ; 49(5): 1305-1315, 2023 09 07.

Article in English | MEDLINE | ID: mdl-37418754

ABSTRACT

BACKGROUND AND HYPOTHESIS: Psychiatric disorders impose a huge health and economic burden on modern society. However, there is currently no proven completely effective treatment available, partly owing to the inefficiency of drug target identification and validation. We aim to identify therapeutic targets relevant to psychiatric disorders by conducting Mendelian randomization (MR) analysis. STUDY DESIGN: We performed genome-wide MR analysis by integrating expression quantitative trait loci (eQTL) of 4479 actionable genes that encode druggable proteins and genetic summary statistics from genome-wide association studies of psychiatric disorders. After conducting colocalization analysis on the brain MR findings, we employed protein quantitative trait loci (pQTL) data as genetic proposed instruments for intersecting the colocalized genes to provide further genetic evidence. STUDY RESULTS: By performing MR and colocalization analysis with eQTL genetic instruments, we obtained 31 promising drug targets for psychiatric disorders, including 21 significant genes for schizophrenia, 7 for bipolar disorder, 2 for depression, 1 for attention deficit and hyperactivity (ADHD) and none for autism spectrum disorder. Combining MR results using pQTL genetic instruments, we finally proposed 8 drug-targeting genes supported by the strongest MR evidence, including gene ACE, BTN3A3, HAPLN4, MAPK3 and NEK4 for schizophrenia, gene NEK4 and HAPLN4 for bipolar disorder, and gene TIE1 for ADHD. CONCLUSIONS: Our findings with genetic support were more likely to be to succeed in clinical trials. In addition, our study prioritizes approved drug targets for the development of new therapies and provides critical drug reuse opportunities for psychiatric disorders.

Subject(s)

Attention Deficit Disorder with Hyperactivity , Autism Spectrum Disorder , Bipolar Disorder , Humans , Genome-Wide Association Study/methods , Mendelian Randomization Analysis/methods , Bipolar Disorder/drug therapy , Bipolar Disorder/genetics , Attention Deficit Disorder with Hyperactivity/genetics , Polymorphism, Single Nucleotide/genetics

9.

CNNGRN: A Convolutional Neural Network-Based Method for Gene Regulatory Network Inference From Bulk Time-Series Expression Data.

Gao, Zhen; Tang, Jin; Xia, Junfeng; Zheng, Chun-Hou; Wei, Pi-Jing.

IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2853-2861, 2023.

Article in English | MEDLINE | ID: mdl-37267145

ABSTRACT

Gene regulatory networks (GRNs) participate in many biological processes, and reconstructing them plays an important role in systems biology. Although many advanced methods have been proposed for GRN reconstruction, their predictive performance is far from the ideal standard, so it is urgent to design a more effective method to reconstruct GRN. Moreover, most methods only consider the gene expression data, ignoring the network structure information contained in GRN. In this study, we propose a supervised model named CNNGRN, which infers GRN from bulk time-series expression data via convolutional neural network (CNN) model, with a more informative feature. Bulk time series gene expression data imply the intricate regulatory associations between genes, and the network structure feature of ground-truth GRN contains rich neighbor information. Hence, CNNGRN integrates the above two features as model inputs. In addition, CNN is adopted to extract intricate features of genes and infer the potential associations between regulators and target genes. Moreover, feature importance visualization experiments are implemented to seek the key features. Experimental results show that CNNGRN achieved competitive performance on benchmark datasets compared to the state-of-the-art computational methods. Finally, hub genes identified based on CNNGRN have been confirmed to be involved in biological processes through literature.

Subject(s)

Algorithms , Gene Regulatory Networks , Gene Regulatory Networks/genetics , Time Factors , Neural Networks, Computer , Systems Biology , Computational Biology/methods

10.

Deep learning-based multi-functional therapeutic peptides prediction with a multi-label focal dice loss function.

Fan, Henghui; Yan, Wenhui; Wang, Lihua; Liu, Jie; Bin, Yannan; Xia, Junfeng.

Bioinformatics ; 39(6)2023 06 01.

Article in English | MEDLINE | ID: mdl-37216900

ABSTRACT

MOTIVATION: With the great number of peptide sequences produced in the postgenomic era, it is highly desirable to identify the various functions of therapeutic peptides quickly. Furthermore, it is a great challenge to predict accurate multi-functional therapeutic peptides (MFTP) via sequence-based computational tools. RESULTS: Here, we propose a novel multi-label-based method, named ETFC, to predict 21 categories of therapeutic peptides. The method utilizes a deep learning-based model architecture, which consists of four blocks: embedding, text convolutional neural network, feed-forward network, and classification blocks. This method also adopts an imbalanced learning strategy with a novel multi-label focal dice loss function. multi-label focal dice loss is applied in the ETFC method to solve the inherent imbalance problem in the multi-label dataset and achieve competitive performance. The experimental results state that the ETFC method is significantly better than the existing methods for MFTP prediction. With the established framework, we use the teacher-student-based knowledge distillation to obtain the attention weight from the self-attention mechanism in the MFTP prediction and quantify their contributions toward each of the investigated activities. AVAILABILITY AND IMPLEMENTATION: The source code and dataset are available via: https://github.com/xialab-ahu/ETFC.

Subject(s)

Deep Learning , Humans , Neural Networks, Computer , Peptides/therapeutic use , Software

11.

PredDSMC: A predictor for driver synonymous mutations in human cancers.

Wang, Lihua; Sun, Jianhui; Ma, Shunshuai; Xia, Junfeng; Li, Xiaoyan.

Front Genet ; 14: 1164593, 2023.

Article in English | MEDLINE | ID: mdl-37051593

ABSTRACT

Introduction: Driver mutations play a critical role in the occurrence and development of human cancers. Most studies have focused on missense mutations that function as drivers in cancer. However, accumulating experimental evidence indicates that synonymous mutations can also act as driver mutations. Methods: Here, we proposed a computational method called PredDSMC to accurately predict driver synonymous mutations in human cancers. We first systematically explored four categories of multimodal features, including sequence features, splicing features, conservation scores, and functional scores. Further feature selection was carried out to remove redundant features and improve the model performance. Finally, we utilized the random forest classifier to build PredDSMC. Results: The results of two independent test sets indicated that PredDSMC outperformed the state-of-the-art methods in differentiating driver synonymous mutations from passenger mutations. Discussion: In conclusion, we expect that PredDSMC, as a driver synonymous mutation prediction method, will be a valuable method for gaining a deeper understanding of synonymous mutations in human cancers.

12.

PhaGAA: an integrated web server platform for phage genome annotation and analysis.

Wu, Jiawei; Liu, Qingrui; Li, Min; Xu, Jiliang; Wang, Chen; Zhang, Junyin; Xiao, Minfeng; Bin, Yannan; Xia, Junfeng.

Bioinformatics ; 39(3)2023 03 01.

Article in English | MEDLINE | ID: mdl-36882183

ABSTRACT

MOTIVATION: Phage genome annotation plays a key role in the design of phage therapy. To date, there have been various genome annotation tools for phages, but most of these tools focus on mono-functional annotation and have complex operational processes. Accordingly, comprehensive and user-friendly platforms for phage genome annotation are needed. RESULTS: Here, we propose PhaGAA, an online integrated platform for phage genome annotation and analysis. By incorporating several annotation tools, PhaGAA is constructed to annotate the prophage genome at DNA and protein levels and provide the analytical results. Furthermore, PhaGAA could mine and annotate phage genomes from bacterial genome or metagenome. In summary, PhaGAA will be a useful resource for experimental biologists and help advance the phage synthetic biology in basic and application research. AVAILABILITY AND IMPLEMENTATION: PhaGAA is freely available at http://phage.xialab.info/.

Subject(s)

Bacteriophages , Bacteriophages/genetics , Software , Computers , Metagenome , Genome, Bacterial , Molecular Sequence Annotation

13.

Deleterious synonymous mutation identification based on selective ensemble strategy.

Wang, Lihua; Zhang, Tao; Yu, Lihong; Zheng, Chun-Hou; Yin, Wenguang; Xia, Junfeng; Zhang, Tiejun.

Brief Bioinform ; 24(1)2023 01 19.

Article in English | MEDLINE | ID: mdl-36611253

ABSTRACT

Although previous studies have revealed that synonymous mutations contribute to various human diseases, distinguishing deleterious synonymous mutations from benign ones is still a challenge in medical genomics. Recently, computational tools have been introduced to predict the harmfulness of synonymous mutations. However, most of these computational tools rely on balanced training sets without considering abundant negative samples that could result in deficient performance. In this study, we propose a computational model that uses a selective ensemble to predict deleterious synonymous mutations (seDSM). We construct several candidate base classifiers for the ensemble using balanced training subsets randomly sampled from the imbalanced benchmark training sets. The diversity measures of the base classifiers are calculated by the pairwise diversity metrics, and the classifiers with the highest diversities are selected for integration using soft voting for synonymous mutation prediction. We also design two strategies for filling in missing values in the imbalanced dataset and constructing models using different pairwise diversity metrics. The experimental results show that a selective ensemble based on double fault with the ensemble strategy EKNNI for filling in missing values is the most effective scheme. Finally, using 40-dimensional biology features, we propose a novel model based on a selective ensemble for predicting deleterious synonymous mutations (seDSM). seDSM outperformed other state-of-the-art methods on the independent test sets according to multiple evaluation indicators, indicating that it has an outstanding predictive performance for deleterious synonymous mutations. We hope that seDSM will be useful for studying deleterious synonymous mutations and advancing our understanding of synonymous mutations. The source code of seDSM is freely accessible at https://github.com/xialab-ahu/seDSM.git.

Subject(s)

Genomics , Silent Mutation , Humans , Genomics/methods , Software , Algorithms

14.

scDCCA: deep contrastive clustering for single-cell RNA-seq data based on auto-encoder network.

Wang, Jing; Xia, Junfeng; Wang, Haiyun; Su, Yansen; Zheng, Chun-Hou.

Brief Bioinform ; 24(1)2023 01 19.

Article in English | MEDLINE | ID: mdl-36631401

ABSTRACT

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.

Subject(s)

Gene Expression Profiling , Single-Cell Gene Expression Analysis , Humans , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Algorithms , Cluster Analysis

15.

frDSM: An Ensemble Predictor With Effective Feature Representation for Deleterious Synonymous Mutation in Human Genome.

Wang, Huadong; Sun, Jianhui; Liu, Mengya; Zheng, Chun-Hou; Xia, Junfeng; Cheng, Na.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 371-377, 2023.

Article in English | MEDLINE | ID: mdl-35420988

ABSTRACT

With the discovery of causality between synonymous mutations and diseases, it has become increasingly important to identify deleterious synonymous mutations for better understanding of their functional mechanisms. Although several machine learning methods have been proposed to solve the task, an effective feature representation method that can make use of the inner difference and relevance between deleterious and benign synonymous mutations is still challenging considering the vast number of synonymous mutations in human genome. In this work, we developed a robust and accurate predictor called frDSM for deleterious synonymous mutation prediction using logistic regression. More specifically, we introduced an effective feature representation learning method which exploits multiple feature descriptors from different perspectives including functional scores obtained from previously computational methods, evolutionary conservation, splicing and sequence feature descriptors, and these features descriptors were input into the 76 XGBoost classifiers to obtain the predictive probabilities values. These probabilities were concatenated to generate the 76-dimension new feature vector, and feature selection method was used to remove redundant and irrelevant features. Experimental results show that frDSM enables robust and accurate prediction than the competing prediction methods with 31 optimal features, which demonstrated the effectiveness of the feature representation learning method. frDSM is freely available at http://frdsm.xialab.info.

Subject(s)

Genome, Human , Silent Mutation , Humans , Genome, Human/genetics , Machine Learning , Algorithms

16.

PrMFTP: Multi-functional therapeutic peptides prediction based on multi-head self-attention mechanism and class weight optimization.

Yan, Wenhui; Tang, Wending; Wang, Lihua; Bin, Yannan; Xia, Junfeng.

PLoS Comput Biol ; 18(9): e1010511, 2022 09.

Article in English | MEDLINE | ID: mdl-36094961

ABSTRACT

Prediction of therapeutic peptide is a significant step for the discovery of promising therapeutic drugs. Most of the existing studies have focused on the mono-functional therapeutic peptide prediction. However, the number of multi-functional therapeutic peptides (MFTP) is growing rapidly, which requires new computational schemes to be proposed to facilitate MFTP discovery. In this study, based on multi-head self-attention mechanism and class weight optimization algorithm, we propose a novel model called PrMFTP for MFTP prediction. PrMFTP exploits multi-scale convolutional neural network, bi-directional long short-term memory, and multi-head self-attention mechanisms to fully extract and learn informative features of peptide sequence to predict MFTP. In addition, we design a class weight optimization scheme to address the problem of label imbalanced data. Comprehensive evaluation demonstrate that PrMFTP is superior to other state-of-the-art computational methods for predicting MFTP. We provide a user-friendly web server of PrMFTP, which is available at http://bioinfo.ahu.edu.cn/PrMFTP.

Subject(s)

Algorithms , Peptides , Peptides/therapeutic use

17.

Prediction of circRNA-Disease Associations Based on the Combination of Multi-Head Graph Attention Network and Graph Convolutional Network.

Cao, Ruifen; He, Chuan; Wei, Pijing; Su, Yansen; Xia, Junfeng; Zheng, Chunhou.

Biomolecules ; 12(7)2022 07 02.

Article in English | MEDLINE | ID: mdl-35883487

ABSTRACT

Circular RNAs (circRNAs) are covalently closed single-stranded RNA molecules, which have many biological functions. Previous experiments have shown that circRNAs are involved in numerous biological processes, especially regulatory functions. It has also been found that circRNAs are associated with complex diseases of human beings. Therefore, predicting the associations of circRNA with disease (called circRNA-disease associations) is useful for disease prevention, diagnosis and treatment. In this work, we propose a novel computational approach called GGCDA based on the Graph Attention Network (GAT) and Graph Convolutional Network (GCN) to predict circRNA-disease associations. Firstly, GGCDA combines circRNA sequence similarity, disease semantic similarity and corresponding Gaussian interaction profile kernel similarity, and then a random walk with restart algorithm (RWR) is used to obtain the preliminary features of circRNA and disease. Secondly, a heterogeneous graph is constructed from the known circRNA-disease association network and the calculated similarity of circRNAs and diseases. Thirdly, the multi-head Graph Attention Network (GAT) is adopted to obtain different weights of circRNA and disease features, and then GCN is employed to aggregate the features of adjacent nodes in the network and the features of the nodes themselves, so as to obtain multi-view circRNA and disease features. Finally, we combined a multi-layer fully connected neural network to predict the associations of circRNAs with diseases. In comparison with state-of-the-art methods, GGCDA can achieve AUC values of 0.9625 and 0.9485 under the results of fivefold cross-validation on two datasets, and AUC of 0.8227 on the independent test set. Case studies further demonstrate that our approach is promising for discovering potential circRNA-disease associations.

Subject(s)

Neural Networks, Computer , RNA, Circular , Algorithms , Computational Biology/methods , Humans , RNA , RNA, Circular/genetics

18.

DPProm: A Two-Layer Predictor for Identifying Promoters and Their Types on Phage Genome Using Deep Learning.

Wang, Chen; Zhang, Junyin; Cheng, Li; Wu, Jiawei; Xiao, Minfeng; Xia, Junfeng; Bin, Yannan.

IEEE J Biomed Health Inform ; 26(10): 5258-5266, 2022 10.

Article in English | MEDLINE | ID: mdl-35867364

ABSTRACT

With the number of phage genomes increasing, it is urgent to develop new bioinformatics methods for phage genome annotation. Promoter, a DNA region, is important for gene transcriptional regulation. In the era of post-genomics, the availability of data makes it possible to establish computational models for promoter identification with robustness. In this work, we introduce DPProm, a two-layer model composed of DPProm-1L and DPProm-2L, to predict promoters and their types for phages. On the first layer, as a dual-channel deep neural network ensemble method fusing multi-view features (sequence feature and handcrafted feature), the model DPProm-1L is proposed to identify whether a DNA sequence is a promoter or non-promoter. The sequence feature is extracted with convolutional neural network (CNN). And the handcrafted feature is the combination of free energy, GC content, cumulative skew, and Z curve features. On the second layer, DPProm-2L based on CNN is trained to predict the promoters' types (host or phage). For the realization of prediction on the whole genomes, the model DPProm, combines with a novel sequence data processing workflow, which contains sliding window and merging sequences modules. Experimental results show that DPProm outperforms the state-of-the-art methods, and decreases the false positive rate effectively on whole genome prediction. Furthermore, we provide a user-friendly web at http://bioinfo.ahu.edu.cn/DPProm. We expect that DPProm can serve as a useful tool for identification of promoters and their types.

Subject(s)

Bacteriophages , Deep Learning , Bacteriophages/genetics , DNA , Genomics/methods , Humans , Promoter Regions, Genetic/genetics

19.

dbBIP: a comprehensive bipolar disorder database for genetic research.

Li, Xiaoyan; Ma, Shunshuai; Yan, Wenhui; Wu, Yong; Kong, Hui; Zhang, Mingshan; Luo, Xiongjian; Xia, Junfeng.

Database (Oxford) ; 20222022 07 02.

Article in English | MEDLINE | ID: mdl-35779245

ABSTRACT

Bipolar disorder (BIP) is one of the most common hereditary psychiatric disorders worldwide. Elucidating the genetic basis of BIP will play a pivotal role in mechanistic delineation. Genome-wide association studies (GWAS) have successfully reported multiple susceptibility loci conferring BIP risk, thus providing insight into the effects of its underlying pathobiology. However, difficulties remain in the extrication of important and biologically relevant data from genetic discoveries related to psychiatric disorders such as BIP. There is an urgent need for an integrated and comprehensive online database with unified access to genetic and multi-omics data for in-depth data mining. Here, we developed the dbBIP, a database for BIP genetic research based on published data. The dbBIP consists of several modules, i.e.: (i) single nucleotide polymorphism (SNP) module, containing large-scale GWAS genetic summary statistics and functional annotation information relevant to risk variants; (ii) gene module, containing BIP-related candidate risk genes from various sources and (iii) analysis module, providing a simple and user-friendly interface to analyze one's own data. We also conducted extensive analyses, including functional SNP annotation, integration (including summary-data-based Mendelian randomization and transcriptome-wide association studies), co-expression, gene expression, tissue expression, protein-protein interaction and brain expression quantitative trait loci analyses, thus shedding light on the genetic causes of BIP. Finally, we developed a graphical browser with powerful search tools to facilitate data navigation and access. The dbBIP provides a comprehensive resource for BIP genetic research as well as an integrated analysis platform for researchers and can be accessed online at http://dbbip.xialab.info. Database URL: http://dbbip.xialab.info.

Subject(s)

Bipolar Disorder , Genome-Wide Association Study , Bipolar Disorder/genetics , Genetic Research , Humans , Quantitative Trait Loci , Software

20.

scHFC: a hybrid fuzzy clustering method for single-cell RNA-seq data optimized by natural computation.

Wang, Jing; Xia, Junfeng; Tan, Dayu; Lin, Rongxin; Su, Yansen; Zheng, Chun-Hou.

Brief Bioinform ; 23(2)2022 03 10.

Article in English | MEDLINE | ID: mdl-35136924

ABSTRACT

Rapid development of single-cell RNA sequencing (scRNA-seq) technology has allowed researchers to explore biological phenomena at the cellular scale. Clustering is a crucial and helpful step for researchers to study the heterogeneity of cell. Although many clustering methods have been proposed, massive dropout events and the curse of dimensionality in scRNA-seq data make it still difficult to analysis because they reduce the accuracy of clustering methods, leading to misidentification of cell types. In this work, we propose the scHFC, which is a hybrid fuzzy clustering method optimized by natural computation based on Fuzzy C Mean (FCM) and Gath-Geva (GG) algorithms. Specifically, principal component analysis algorithm is utilized to reduce the dimensions of scRNA-seq data after it is preprocessed. Then, FCM algorithm optimized by simulated annealing algorithm and genetic algorithm is applied to cluster the data to output a membership matrix, which represents the initial clustering result and is taken as the input for GG algorithm to get the final clustering results. We also develop a cluster number estimation method called multi-index comprehensive estimation, which can estimate the cluster numbers well by combining four clustering effectiveness indexes. The performance of the scHFC method is evaluated on 17 scRNA-seq datasets, and compared with six state-of-the-art methods. Experimental results validate the better performance of our scHFC method in terms of clustering accuracy and stability of algorithm. In short, scHFC is an effective method to cluster cells for scRNA-seq data, and it presents great potential for downstream analysis of scRNA-seq data. The source code is available at https://github.com/WJ319/scHFC.

Subject(s)

Single-Cell Analysis , Software , Algorithms , Cluster Analysis , Gene Expression Profiling/methods , RNA-Seq , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL