Search | VHL Regional Portal

1.

Asymmetric total synthesis of polycyclic xanthenes and discovery of a WalK activator active against MRSA.

Cheng, Min-Jing; Wu, Yan-Yi; Zeng, Hao; Zhang, Tian-Hong; Hu, Yan-Xia; Liu, Shi-Yi; Cui, Rui-Qin; Hu, Chun-Xia; Zou, Quan-Ming; Li, Chuang-Chuang; Ye, Wen-Cai; Huang, Wei; Wang, Lei.

Nat Commun ; 15(1): 5879, 2024 Jul 13.

Article in English | MEDLINE | ID: mdl-38997253

ABSTRACT

The development of new antibiotics continues to pose challenges, particularly considering the growing threat of multidrug-resistant Staphylococcus aureus. Structurally diverse natural products provide a promising source of antibiotics. Herein, we outline a concise approach for the collective asymmetric total synthesis of polycyclic xanthene myrtucommulone D and five related congeners. The strategy involves rapid assembly of the challenging benzopyrano[2,3-a]xanthene core, highly diastereoselective establishment of three contiguous stereocenters through a retro-hemiketalization/double Michael cascade reaction, and a Mitsunobu-mediated chiral resolution approach with high optical purity and broad substrate scope. Quantum mechanical calculations provide insight into stereoselective construction mechanism of the three contiguous stereocenters. Additionally, this work leads to the discovery of an antibacterial agent against both drug-sensitive and drug-resistant S. aureus. This compound operates through a unique mechanism that promotes bacterial autolysis by activating the two-component sensory histidine kinase WalK. Our research holds potential for future antibacterial drug development.

Subject(s)

Anti-Bacterial Agents , Methicillin-Resistant Staphylococcus aureus , Xanthenes , Methicillin-Resistant Staphylococcus aureus/drug effects , Anti-Bacterial Agents/pharmacology , Anti-Bacterial Agents/chemical synthesis , Anti-Bacterial Agents/chemistry , Xanthenes/chemical synthesis , Xanthenes/pharmacology , Xanthenes/chemistry , Microbial Sensitivity Tests , Stereoisomerism , Polycyclic Compounds/chemical synthesis , Polycyclic Compounds/pharmacology , Polycyclic Compounds/chemistry , Drug Discovery , Molecular Structure

2.

Exploring the antiviral inhibitory activity of Niloticin against the NS2B/NS3 protease of Dengue virus (DENV2).

Stalin, Antony; Han, Jiajia; Reegan, Appadurai Daniel; Ignacimuthu, Savarimuthu; Liu, Shuwen; Yao, Xingang; Zou, Quan.

Int J Biol Macromol ; : 133791, 2024 Jul 09.

Article in English | MEDLINE | ID: mdl-38992553

ABSTRACT

Dengue virus (DENV2) is the cause of dengue disease and a worldwide health problem. DENV2 replicates in the host cell using polyproteins such as NS3 protease in conjugation with NS2B cofactor, making NS3 protease a promising antiviral drug-target. This study investigated the efficacy of 'Niloticin' against NS2B/NS3-protease. In silico and in vitro analyses were performed which included interaction of niloticin with NS2B/NS3-protease, protein stability and flexibility, mutation effect, betweenness centrality of residues and analysis of cytotoxicity, protein expression and WNV NS3-protease activity. Similar like acyclovir, niloticin forms strong H-bonds and hydrophobic interactions with residues LEU149, ASN152, LYS74, GLY148 and ALA164. The stability of the niloticin-NS2B/NS3-protease complex was found to be stable compared to the apo NS2B/NS3-protease in structural deviation, PCA, compactness and FEL analysis. The IC50 value of niloticin was 0.14â¯µM in BHK cells based on in vitro cytotoxicity analysis and showed significant activity at 2.5â¯µM in a concentration-dependent manner. Western blotting and qRT-PCR analyses showed that niloticin reduced DENV2 protein transcription in a dose-dependent manner. Besides, niloticin confirmed the inhibition of NS3-protease by the SensoLyte 440 WNV protease detection kit. These promising results suggest that niloticin could be an effective antiviral drug against DENV2 and other flaviviruses.

3.

Prediction of Potential miRNA-Disease Associations Based on a Masked Graph Autoencoder.

Ke, Chenchen; Feng, Hailin; Zou, Quan; Zhu, Zhechen; Liu, Tongcun.

IEEE/ACM Trans Comput Biol Bioinform ; PP2024 Jul 02.

Article in English | MEDLINE | ID: mdl-38954583

ABSTRACT

Biomedical evidence has demonstrated the relevance of microRNA (miRNA) dysregulation in complex human diseases, and determining the relationship between miRNAs and diseases can aid in the early detection and prevention of diseases. Traditional biological experimental methods have the disadvantages of high cost and low efficiency, which are well compensated by computational methods. However, many computational methods have the challenge of excessively focusing on the neighbor relationship, ignoring the structural information of the graph, and belittling the redundant information of the graph structure. This study proposed a computational model based on a graph-masking autoencoder named MGAEMDA. MGAEMDA is an asymmetric framework in which the encoder maps partially observed graphs into latent representations. The decoder reconstructs the masked structural information based on the edge and node levels and combines it with linear matrices to obtain the result. The empirical results on the two datasets reveal that the MGAEMDA model performs better than its counterparts. We also demonstrated the predictive performance of MGAEMDA using a case study of four diseases, and all the top 30 predicted miRNAs were validated in the database, providing further evidence of the excellent performance of the model.

4.

GraphADT: Empowering interpretable predictions of acute dermal toxicity with Multi-View graph pooling and structure remapping.

Ma, Xinqian; Fu, Xiangzheng; Wang, Tao; Zhuo, Linlin; Zou, Quan.

Bioinformatics ; 2024 Jul 04.

Article in English | MEDLINE | ID: mdl-38967119

ABSTRACT

MOTIVATION: Accurate prediction of acute dermal toxicity (ADT) is essential for the safe and effective development of contact drugs. Currently, graph neural networks (GNNs), a form of deep learning technology, accurately model the structure of compound molecules, enhancing predictions of their ADT. However, many existing methods emphasize atom-level information transfer and overlook crucial data conveyed by molecular bonds and their interrelationships. Additionally, these methods often generate" equal" node representations across the entire graph, failing to accentuate" important" substructures like functional groups, pharmacophores, and toxicophores, thereby reducing interpretability. RESULTS: We introduce a novel model, GraphADT, utilizing structure remapping and multi-view graph pooling technologies to accurately predict compound ADT. Initially, our model applies structure remapping to better delineate bonds, transforming" bonds" into new nodes and" bond-atom-bond" interactions into new edges, thereby reconstructing the compound molecular graph. Subsequently, we employ multi-view graph pooling to amalgamate data from various perspectives, minimizing biases inherent to single-view analyses. Following this, the model generates a robust node ranking collaboratively, emphasizing critical nodes or substructures to enhance model interpretability. Lastly, we apply a graph comparison learning strategy to train both the original and structure remapped molecular graphs, deriving the final molecular representation. Experimental results on public datasets indicate that the GraphADT model outperforms existing state-of-the-art models. The GraphADT model has been demonstrated to effectively predict compound ADT, offering potential guidance for the development of contact drugs and related treatments. AVAILABILITY AND IMPLEMENTATION: Our code and data are accessible at: https://github.com/mxqmxqmxq/GraphADT.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.

Predicting intercellular communication based on metabolite-related ligand-receptor interactions with MRCLinkdb.

Zhang, Yuncong; Yang, Yu; Ren, Liping; Zhan, Meixiao; Sun, Taoping; Zou, Quan; Zhang, Yang.

BMC Biol ; 22(1): 152, 2024 Jul 08.

Article in English | MEDLINE | ID: mdl-38978014

ABSTRACT

BACKGROUND: Metabolite-associated cell communications play critical roles in maintaining human biological function. However, most existing tools and resources focus only on ligand-receptor interaction pairs where both partners are proteinaceous, neglecting other non-protein molecules. To address this gap, we introduce the MRCLinkdb database and algorithm, which aggregates and organizes data related to non-protein L-R interactions in cell-cell communication, providing a valuable resource for predicting intercellular communication based on metabolite-related ligand-receptor interactions. RESULTS: Here, we manually curated the metabolite-ligand-receptor (ML-R) interactions from the literature and known databases, ultimately collecting over 790 human and 670 mouse ML-R interactions. Additionally, we compiled information on over 1900 enzymes and 260 transporter entries associated with these metabolites. We developed Metabolite-Receptor based Cell Link Database (MRCLinkdb) to store these ML-R interactions data. Meanwhile, the platform also offers extensive information for presenting ML-R interactions, including fundamental metabolite information and the overall expression landscape of metabolite-associated gene sets (such as receptor, enzymes, and transporter proteins) based on single-cell transcriptomics sequencing (covering 35 human and 26 mouse tissues, 52 human and 44 mouse cell types) and bulk RNA-seq/microarray data (encompassing 62 human and 39 mouse tissues). Furthermore, MRCLinkdb introduces a web server dedicated to the analysis of intercellular communication based on ML-R interactions. MRCLinkdb is freely available at https://www.cellknowledge.com.cn/mrclinkdb/ . CONCLUSIONS: In addition to supplementing ligand-receptor databases, MRCLinkdb may provide new perspectives for decoding the intercellular communication and advancing related prediction tools based on ML-R interactions.

Subject(s)

Cell Communication , Humans , Ligands , Animals , Mice , Databases, Factual

6.

Sequence homology score-based deep fuzzy network for identifying therapeutic peptides.

Guo, Xiaoyi; Zheng, Ziyu; Cheong, Kang Hao; Zou, Quan; Tiwari, Prayag; Ding, Yijie.

Neural Netw ; 178: 106458, 2024 Jun 10.

Article in English | MEDLINE | ID: mdl-38901093

ABSTRACT

The detection of therapeutic peptides is a topic of immense interest in the biomedical field. Conventional biochemical experiment-based detection techniques are tedious and time-consuming. Computational biology has become a useful tool for improving the detection efficiency of therapeutic peptides. Most computational methods do not consider the deviation caused by noise. To improve the generalization performance of therapeutic peptide prediction methods, this work presents a sequence homology score-based deep fuzzy echo-state network with maximizing mixture correntropy (SHS-DFESN-MMC) model. Our method is compared with the existing methods on eight types of therapeutic peptide datasets. The model parameters are determined by 10 fold cross-validation on their training sets and verified by independent test sets. Across the 8 datasets, the average area under the receiver operating characteristic curve (AUC) values of SHS-DFESN-MMC are the highest on both the training (0.926) and independent sets (0.923).

7.

A comprehensive survey of dimensionality reduction and clustering methods for single-cell and spatial transcriptomics data.

Sun, Yidi; Kong, Lingling; Huang, Jiayi; Deng, Hongyan; Bian, Xinling; Li, Xingfeng; Cui, Feifei; Dou, Lijun; Cao, Chen; Zou, Quan; Zhang, Zilong.

Brief Funct Genomics ; 2024 Jun 11.

Article in English | MEDLINE | ID: mdl-38860675

ABSTRACT

In recent years, the application of single-cell transcriptomics and spatial transcriptomics analysis techniques has become increasingly widespread. Whether dealing with single-cell transcriptomic or spatial transcriptomic data, dimensionality reduction and clustering are indispensable. Both single-cell and spatial transcriptomic data are often high-dimensional, making the analysis and visualization of such data challenging. Through dimensionality reduction, it becomes possible to visualize the data in a lower-dimensional space, allowing for the observation of relationships and differences between cell subpopulations. Clustering enables the grouping of similar cells into the same cluster, aiding in the identification of distinct cell subpopulations and revealing cellular diversity, providing guidance for downstream analyses. In this review, we systematically summarized the most widely recognized algorithms employed for the dimensionality reduction and clustering analysis of single-cell transcriptomic and spatial transcriptomic data. This endeavor provides valuable insights and ideas that can contribute to the development of novel tools in this rapidly evolving field.

8.

Drug-target interaction predictions with multi-view similarity network fusion strategy and deep interactive attention mechanism.

Song, Wei; Xu, Lewen; Han, Chenguang; Tian, Zhen; Zou, Quan.

Bioinformatics ; 40(6)2024 Jun 03.

Article in English | MEDLINE | ID: mdl-38837345

ABSTRACT

MOTIVATION: Accurately identifying the drug-target interactions (DTIs) is one of the crucial steps in the drug discovery and drug repositioning process. Currently, many computational-based models have already been proposed for DTI prediction and achieved some significant improvement. However, these approaches pay little attention to fuse the multi-view similarity networks related to drugs and targets in an appropriate way. Besides, how to fully incorporate the known interaction relationships to accurately represent drugs and targets is not well investigated. Therefore, there is still a need to improve the accuracy of DTI prediction models. RESULTS: In this study, we propose a novel approach that employs Multi-view similarity network fusion strategy and deep Interactive attention mechanism to predict Drug-Target Interactions (MIDTI). First, MIDTI constructs multi-view similarity networks of drugs and targets with their diverse information and integrates these similarity networks effectively in an unsupervised manner. Then, MIDTI obtains the embeddings of drugs and targets from multi-type networks simultaneously. After that, MIDTI adopts the deep interactive attention mechanism to further learn their discriminative embeddings comprehensively with the known DTI relationships. Finally, we feed the learned representations of drugs and targets to the multilayer perceptron model and predict the underlying interactions. Extensive results indicate that MIDTI significantly outperforms other baseline methods on the DTI prediction task. The results of the ablation experiments also confirm the effectiveness of the attention mechanism in the multi-view similarity network fusion strategy and the deep interactive attention mechanism. AVAILABILITY AND IMPLEMENTATION: https://github.com/XuLew/MIDTI.

Subject(s)

Computational Biology , Computational Biology/methods , Drug Discovery/methods , Algorithms , Drug Repositioning/methods , Pharmaceutical Preparations/metabolism , Pharmaceutical Preparations/chemistry , Humans

9.

GRACE: Unveiling Gene Regulatory Networks With Causal Mechanistic Graph Neural Networks in Single-Cell RNA-Sequencing Data.

Wang, Jia-Cheng; Chen, Yao-Jia; Zou, Quan.

IEEE Trans Neural Netw Learn Syst ; PP2024 Jun 19.

Article in English | MEDLINE | ID: mdl-38896510

ABSTRACT

Reconstructing gene regulatory networks (GRNs) using single-cell RNA sequencing (scRNA-seq) data holds great promise for unraveling cellular fate development and heterogeneity. While numerous machine-learning methods have been proposed to infer GRNs from scRNA-seq gene expression data, many of them operate solely in a statistical or black box manner, limiting their capacity for making causal inferences between genes. In this study, we introduce GRN inference with Accuracy and Causal Explanation (GRACE), a novel graph-based causal autoencoder framework that combines a structural causal model (SCM) with graph neural networks (GNNs) to enable GRN inference and gene causal reasoning from scRNA-seq data. By explicitly modeling causal relationships between genes, GRACE facilitates the learning of regulatory context and gene embeddings. With the learned gene signals, our model successfully decoding the causal structures and alleviates the accurate determination of multiple attributes of gene regulation that is important to determine the regulatory levels. Through extensive evaluations on seven benchmarks, we demonstrate that GRACE outperforms 14 state-of-the-art GRN inference methods, with the incorporation of causal mechanisms significantly enhancing the accuracy of GRN and gene causality inference. Furthermore, the application to human peripheral blood mononuclear cell (PBMC) samples reveals cell type-specific regulators in monocyte phagocytosis and immune regulation, validated through network analysis and functional enrichment analysis.

10.

Fusion of multi-source relationships and topology to infer lncRNA-protein interactions.

Zhang, Xinyu; Liu, Mingzhe; Li, Zhen; Zhuo, Linlin; Fu, Xiangzheng; Zou, Quan.

Mol Ther Nucleic Acids ; 35(2): 102187, 2024 Jun 11.

Article in English | MEDLINE | ID: mdl-38706631

ABSTRACT

Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.

11.

Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data.

Tian, Qinzhong; Zhang, Pinglu; Zhai, Yixiao; Wang, Yansu; Zou, Quan.

Genome Biol Evol ; 16(5)2024 05 02.

Article in English | MEDLINE | ID: mdl-38748485

ABSTRACT

The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.

Subject(s)

High-Throughput Nucleotide Sequencing , Machine Learning , Databases, Genetic , Computational Biology/methods , Classification/methods

12.

RDscan: Extracting RNA-disease relationship from the literature based on pre-training model.

Zhang, Yang; Yang, Yu; Ren, Liping; Ning, Lin; Zou, Quan; Luo, Nanchao; Zhang, Yinghui; Liu, Ruijun.

Methods ; 228: 48-54, 2024 Aug.

Article in English | MEDLINE | ID: mdl-38789016

ABSTRACT

With the rapid advancements in molecular biology and genomics, a multitude of connections between RNA and diseases has been unveiled, making the efficient and accurate extraction of RNA-disease (RD) relationships from extensive biomedical literature crucial for advancing research in this field. This study introduces RDscan, a novel text mining method developed based on the pre-training and fine-tuning strategy, aimed at automatically extracting RD-related information from a vast corpus of literature using pre-trained biomedical large language models (LLM). Initially, we constructed a dedicated RD corpus by manually curating from literature, comprising 2,082 positive and 2,000 negative sentences, alongside an independent test dataset (comprising 500 positive and 500 negative sentences) for training and evaluating RDscan. Subsequently, by fine-tuning the Bioformer and BioBERT pre-trained models, RDscan demonstrated exceptional performance in text classification and named entity recognition (NER) tasks. In 5-fold cross-validation, RDscan significantly outperformed traditional machine learning methods (Support Vector Machine, Logistic Regression and Random Forest). In addition, we have developed an accessible webserver that assists users in extracting RD relationships from text. In summary, RDscan represents the first text mining tool specifically designed for RD relationship extraction, and is poised to emerge as an invaluable tool for researchers dedicated to exploring the intricate interactions between RNA and diseases. Webserver of RDscan is free available at https://cellknowledge.com.cn/RDscan/.

Subject(s)

Data Mining , RNA , Data Mining/methods , RNA/genetics , Humans , Machine Learning , Disease/genetics , Support Vector Machine , Software

13.

ADAM17 variant causes hair loss via ubiquitin ligase TRIM47-mediated degradation.

Wang, Xiaoxiao; Pan, Chaolan; Zheng, Luyao; Wang, Jianbo; Zou, Quan; Sun, Peiyi; Zhou, Kaili; Zhao, Anqi; Cao, Qiaoyu; He, Wei; Wang, Yumeng; Cheng, Ruhong; Yao, Zhirong; Zhang, Si; Zhang, Hui; Li, Ming.

JCI Insight ; 9(13)2024 May 21.

Article in English | MEDLINE | ID: mdl-38771644

ABSTRACT

Hypotrichosis is a genetic disorder characterized by a diffuse and progressive loss of scalp and/or body hair. Nonetheless, the causative genes for several affected individuals remain elusive, and the underlying mechanisms have yet to be fully elucidated. Here, we discovered a dominant variant in a disintegrin and a metalloproteinase domain 17 (ADAM17) gene caused hypotrichosis with woolly hair. Adam17 (p.D647N) knockin mice mimicked the hair abnormality in patients. ADAM17 (p.D647N) mutation led to hair follicle stem cell (HFSC) exhaustion and caused abnormal hair follicles, ultimately resulting in alopecia. Mechanistic studies revealed that ADAM17 binds directly to E3 ubiquitin ligase tripartite motif-containing protein 47 (TRIM47). ADAM17 variant enhanced the association between ADAM17 and TRIM47, leading to an increase in ubiquitination and subsequent degradation of ADAM17 protein. Furthermore, reduced ADAM17 protein expression affected the Notch signaling pathway, impairing the activation, proliferation, and differentiation of HFSCs during hair follicle regeneration. Overexpression of Notch intracellular domain rescued the reduced proliferation ability caused by Adam17 variant in primary fibroblast cells.

Subject(s)

ADAM17 Protein , Alopecia , Hair Follicle , Ubiquitin-Protein Ligases , ADAM17 Protein/metabolism , ADAM17 Protein/genetics , Animals , Alopecia/genetics , Alopecia/metabolism , Alopecia/pathology , Mice , Hair Follicle/metabolism , Hair Follicle/pathology , Humans , Ubiquitin-Protein Ligases/genetics , Ubiquitin-Protein Ligases/metabolism , Ubiquitination , Male , Signal Transduction/genetics , Tripartite Motif Proteins/metabolism , Tripartite Motif Proteins/genetics , Female , Mutation , Gene Knock-In Techniques , Cell Proliferation/genetics , Cell Differentiation/genetics , Proteolysis , Disease Models, Animal , Fibroblasts/metabolism , Receptors, Notch/metabolism , Receptors, Notch/genetics

14.

scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.

Qiu, Yushan; Guo, Dong; Zhao, Pu; Zou, Quan.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38754408

ABSTRACT

MOTIVATION: The technology for analyzing single-cell multi-omics data has advanced rapidly and has provided comprehensive and accurate cellular information by exploring cell heterogeneity in genomics, transcriptomics, epigenomics, metabolomics and proteomics data. However, because of the high-dimensional and sparse characteristics of single-cell multi-omics data, as well as the limitations of various analysis algorithms, the clustering performance is generally poor. Matrix factorization is an unsupervised, dimensionality reduction-based method that can cluster individuals and discover related omics variables from different blocks. Here, we present a novel algorithm that performs joint dimensionality reduction learning and cell clustering analysis on single-cell multi-omics data using non-negative matrix factorization that we named scMNMF. We formulate the objective function of joint learning as a constrained optimization problem and derive the corresponding iterative formulas through alternating iterative algorithms. The major advantage of the scMNMF algorithm remains its capability to explore hidden related features among omics data. Additionally, the feature selection for dimensionality reduction and cell clustering mutually influence each other iteratively, leading to a more effective discovery of cell types. We validated the performance of the scMNMF algorithm using two simulated and five real datasets. The results show that scMNMF outperformed seven other state-of-the-art algorithms in various measurements. AVAILABILITY AND IMPLEMENTATION: scMNMF code can be found at https://github.com/yushanqiu/scMNMF.

Subject(s)

Algorithms , Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Humans , Genomics/methods , Computational Biology/methods , Proteomics/methods , Metabolomics/methods , Epigenomics/methods , Multiomics

15.

msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths.

Li, Yazi; Wei, Xiaoman; Yang, Qinglin; Xiong, An; Li, Xingfeng; Zou, Quan; Cui, Feifei; Zhang, Zilong.

BMC Biol ; 22(1): 126, 2024 May 30.

Article in English | MEDLINE | ID: mdl-38816885

ABSTRACT

BACKGROUND: A promoter is a specific sequence in DNA that has transcriptional regulatory functions, playing a role in initiating gene expression. Identifying promoters and their strengths can provide valuable information related to human diseases. In recent years, computational methods have gained prominence as an effective means for identifying promoter, offering a more efficient alternative to labor-intensive biological approaches. RESULTS: In this study, a two-stage integrated predictor called "msBERT-Promoter" is proposed for identifying promoters and predicting their strengths. The model incorporates multi-scale sequence information through a tokenization strategy and fine-tunes the DNABERT model. Soft voting is then used to fuse the multi-scale information, effectively addressing the issue of insufficient DNA sequence information extraction in traditional models. To the best of our knowledge, this is the first time an integrated approach has been used in the DNABERT model for promoter identification and strength prediction. Our model achieves accuracy rates of 96.2% for promoter identification and 79.8% for promoter strength prediction, significantly outperforming existing methods. Furthermore, through attention mechanism analysis, we demonstrate that our model can effectively combine local and global sequence information, enhancing its interpretability. CONCLUSIONS: msBERT-Promoter provides an effective tool that successfully captures sequence-related attributes of DNA promoters and can accurately identify promoters and predict their strengths. This work paves a new path for the application of artificial intelligence in traditional biology.

Subject(s)

Promoter Regions, Genetic , Computational Biology/methods , DNA/genetics , Humans , Models, Genetic , Sequence Analysis, DNA/methods

16.

Deciphering Microbial Adaptation in the Rhizosphere: Insights into Niche Preference, Functional Profiles, and Cross-Kingdom Co-occurrences.

Wang, Yansu; Zou, Quan.

Microb Ecol ; 87(1): 74, 2024 May 21.

Article in English | MEDLINE | ID: mdl-38771320

ABSTRACT

Rhizosphere microbial communities are to be as critical factors for plant growth and vitality, and their adaptive differentiation strategies have received increasing amounts of attention but are poorly understood. In this study, we obtained bacterial and fungal amplicon sequences from the rhizosphere and bulk soils of various ecosystems to investigate the potential mechanisms of microbial adaptation to the rhizosphere environment. Our focus encompasses three aspects: niche preference, functional profiles, and cross-kingdom co-occurrence patterns. Our findings revealed a correlation between niche similarity and nucleotide distance, suggesting that niche adaptation explains nucleotide variation among some closely related amplicon sequence variants (ASVs). Furthermore, biological macromolecule metabolism and communication among abundant bacteria increase in the rhizosphere conditions, suggesting that bacterial function is trait-mediated in terms of fitness in new habitats. Additionally, our analysis of cross-kingdom networks revealed that fungi act as intermediaries that facilitate connections between bacteria, indicating that microbes can modify their cooperative relationships to adapt. Overall, the evidence for rhizosphere microbial community adaptation, via differences in gene and functional and co-occurrence patterns, elucidates the adaptive benefits of genetic and functional flexibility of the rhizosphere microbiota through niche shifts.

Subject(s)

Adaptation, Physiological , Bacteria , Fungi , Microbiota , Rhizosphere , Soil Microbiology , Fungi/genetics , Fungi/classification , Fungi/physiology , Bacteria/genetics , Bacteria/classification , Bacteria/metabolism , Bacteria/isolation & purification , Ecosystem , Bacterial Physiological Phenomena

17.

Integrated convolution and self-attention for improving peptide toxicity prediction.

Jiao, Shihu; Ye, Xiucai; Sakurai, Tetsuya; Zou, Quan; Liu, Ruijun.

Bioinformatics ; 40(5)2024 May 02.

Article in English | MEDLINE | ID: mdl-38696758

ABSTRACT

MOTIVATION: Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS: We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION: The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.

Subject(s)

Computational Biology , Peptides , Peptides/chemistry , Computational Biology/methods , Humans , Amino Acid Sequence , Algorithms , Software

18.

DeepAVP-TPPred: identification of antiviral peptides using transformed image-based localized descriptors and binary tree growth algorithm.

Ullah, Matee; Akbar, Shahid; Raza, Ali; Zou, Quan.

Bioinformatics ; 40(5)2024 May 02.

Article in English | MEDLINE | ID: mdl-38710482

ABSTRACT

MOTIVATION: Despite the extensive manufacturing of antiviral drugs and vaccination, viral infections continue to be a major human ailment. Antiviral peptides (AVPs) have emerged as potential candidates in the pursuit of novel antiviral drugs. These peptides show vigorous antiviral activity against a diverse range of viruses by targeting different phases of the viral life cycle. Therefore, the accurate prediction of AVPs is an essential yet challenging task. Lately, many machine learning-based approaches have developed for this purpose; however, their limited capabilities in terms of feature engineering, accuracy, and generalization make these methods restricted. RESULTS: In the present study, we aim to develop an efficient machine learning-based approach for the identification of AVPs, referred to as DeepAVP-TPPred, to address the aforementioned problems. First, we extract two new transformed feature sets using our designed image-based feature extraction algorithms and integrate them with an evolutionary information-based feature. Next, these feature sets were optimized using a novel feature selection approach called binary tree growth Algorithm. Finally, the optimal feature space from the training dataset was fed to the deep neural network to build the final classification model. The proposed model DeepAVP-TPPred was tested using stringent 5-fold cross-validation and two independent dataset testing methods, which achieved the maximum performance and showed enhanced efficiency over existing predictors in terms of both accuracy and generalization capabilities. AVAILABILITY AND IMPLEMENTATION: https://github.com/MateeullahKhan/DeepAVP-TPPred.

Subject(s)

Algorithms , Antiviral Agents , Machine Learning , Antiviral Agents/pharmacology , Antiviral Agents/chemistry , Peptides/chemistry , Humans , Computational Biology/methods , Neural Networks, Computer

19.

TPMA: A two pointers meta-alignment tool to ensemble different multiple nucleic acid sequence alignments.

Zhai, Yixiao; Chao, Jiannan; Wang, Yizheng; Zhang, Pinglu; Tang, Furong; Zou, Quan.

PLoS Comput Biol ; 20(4): e1011988, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38557416

ABSTRACT

Accurate multiple sequence alignment (MSA) is imperative for the comprehensive analysis of biological sequences. However, a notable challenge arises as no single MSA tool consistently outperforms its counterparts across diverse datasets. Users often have to try multiple MSA tools to achieve optimal alignment results, which can be time-consuming and memory-intensive. While the overall accuracy of certain MSA results may be lower, there could be local regions with the highest alignment scores, prompting researchers to seek a tool capable of merging these locally optimal results from multiple initial alignments into a globally optimal alignment. In this study, we introduce Two Pointers Meta-Alignment (TPMA), a novel tool designed for the integration of nucleic acid sequence alignments. TPMA employs two pointers to partition the initial alignments into blocks containing identical sequence fragments. It selects blocks with the high sum of pairs (SP) scores to concatenate them into an alignment with an overall SP score superior to that of the initial alignments. Through tests on simulated and real datasets, the experimental results consistently demonstrate that TPMA outperforms M-Coffee in terms of aSP, Q, and total column (TC) scores across most datasets. Even in cases where TPMA's scores are comparable to M-Coffee, TPMA exhibits significantly lower running time and memory consumption. Furthermore, we comprehensively assessed all the MSA tools used in the experiments, considering accuracy, time, and memory consumption. We propose accurate and fast combination strategies for small and large datasets, which streamline the user tool selection process and facilitate large-scale dataset integration. The dataset and source code of TPMA are available on GitHub (https://github.com/malabz/TPMA).

Subject(s)

Algorithms , Nucleic Acids , Sequence Alignment , Coffee , Software

20.

Multi-kernel Learning Fusion Algorithm Based on RNN and GRU for ASD Diagnosis and Pathogenic Brain Region Extraction.

Chen, Jie; Zhang, Huilian; Zou, Quan; Liao, Bo; Bi, Xia-An.

Interdiscip Sci ; 2024 Apr 29.

Article in English | MEDLINE | ID: mdl-38683281

ABSTRACT

Autism spectrum disorder (ASD) is a complex, severe disorder related to brain development. It impairs patient language communication and social behaviors. In recent years, ASD researches have focused on a single-modal neuroimaging data, neglecting the complementarity between multi-modal data. This omission may lead to poor classification. Therefore, it is important to study multi-modal data of ASD for revealing its pathogenesis. Furthermore, recurrent neural network (RNN) and gated recurrent unit (GRU) are effective for sequence data processing. In this paper, we introduce a novel framework for a Multi-Kernel Learning Fusion algorithm based on RNN and GRU (MKLF-RAG). The framework utilizes RNN and GRU to provide feature selection for data of different modalities. Then these features are fused by MKLF algorithm to detect the pathological mechanisms of ASD and extract the most relevant the Regions of Interest (ROIs) for the disease. The MKLF-RAG proposed in this paper has been tested in a variety of experiments with the Autism Brain Imaging Data Exchange (ABIDE) database. Experimental findings indicate that our framework notably enhances the classification accuracy for ASD. Compared with other methods, MKLF-RAG demonstrates superior efficacy across multiple evaluation metrics and could provide valuable insights into the early diagnosis of ASD.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL